Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force consistent sort order for spark? #42

Closed
MichaelChirico opened this issue Oct 30, 2018 · 1 comment
Closed

Force consistent sort order for spark? #42

MichaelChirico opened this issue Oct 30, 2018 · 1 comment
Assignees
Labels

Comments

@MichaelChirico
Copy link
Contributor

I guess related to #12.

It's possible part of Spark's advantage is because it doesn't constrain itself to return groups in the same order as they're fed in. Would require a separate query for by = functionality in other languages as well.

The query to force this would be substantially more complicated and might slow down Spark.

Also related: Rdatatable/data.table#1880

Also, it's probably a bug that spark gets away with using group by and not group by order by at the moment...

@jangorecki
Copy link
Contributor

jangorecki commented Oct 30, 2018

We had small discussion on order of results in this benchmark tasks and conclusion was to use the faster one for each solution, so to ignore the order of rows in the answer. If all solutions would provide api for retaining group and returning in order, then it make sense to have both as separate tests.

@jangorecki jangorecki self-assigned this Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants