You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's possible part of Spark's advantage is because it doesn't constrain itself to return groups in the same order as they're fed in. Would require a separate query for by = functionality in other languages as well.
The query to force this would be substantially more complicated and might slow down Spark.
We had small discussion on order of results in this benchmark tasks and conclusion was to use the faster one for each solution, so to ignore the order of rows in the answer. If all solutions would provide api for retaining group and returning in order, then it make sense to have both as separate tests.
I guess related to #12.
It's possible part of Spark's advantage is because it doesn't constrain itself to return groups in the same order as they're fed in. Would require a separate query for
by =
functionality in other languages as well.The query to force this would be substantially more complicated and might slow down Spark.
Also related: Rdatatable/data.table#1880
Also, it's probably a bug that
spark
gets away with usinggroup by
and notgroup by order by
at the moment...The text was updated successfully, but these errors were encountered: