Force consistent sort order for spark? #42

MichaelChirico · 2018-10-30T09:11:49Z

I guess related to #12.

It's possible part of Spark's advantage is because it doesn't constrain itself to return groups in the same order as they're fed in. Would require a separate query for by = functionality in other languages as well.

The query to force this would be substantially more complicated and might slow down Spark.

Also related: Rdatatable/data.table#1880

Also, it's probably a bug that spark gets away with using group by and not group by order by at the moment...

The text was updated successfully, but these errors were encountered:

jangorecki · 2018-10-30T11:01:31Z

We had small discussion on order of results in this benchmark tasks and conclusion was to use the faster one for each solution, so to ignore the order of rows in the answer. If all solutions would provide api for retaining group and returning in order, then it make sense to have both as separate tests.

jangorecki self-assigned this Oct 30, 2018

jangorecki closed this as completed in 162574d Oct 30, 2018

jangorecki added the spark label Oct 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force consistent sort order for spark? #42

Force consistent sort order for spark? #42

MichaelChirico commented Oct 30, 2018

jangorecki commented Oct 30, 2018 •

edited

Loading

Force consistent sort order for spark? #42

Force consistent sort order for spark? #42

Comments

MichaelChirico commented Oct 30, 2018

jangorecki commented Oct 30, 2018 • edited Loading

jangorecki commented Oct 30, 2018 •

edited

Loading