-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Status.group instead of Status.distinct in HashQueryService #14662
Conversation
DISTINCT clause removes duplicated records according to all the selected attributes. In reality, it can remove duplicated records only looking at statuses.id, but the clause confuses the query planner and yields insufficient performance. The behavior is also problematic if the scope produced by HashQueryService is used to query columns without id (using pluck method, for example). The scope is expected to contain unique statuses, but the uniquness will be evaluated with some arbitrary columns other than id. GROUP BY clause resolves those problem by explicitly specifying the column to take into account for the record distinction. A workaround for the problem of DISTINCT clause in Api::V1::Timelines::TagController is no longer necessary and removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by this and I have a hard time imagining the final query.
Something I notice, though, is that we use distinct
here and there with a column name as parameter, as if it worked the same way as group
. This may be because we got confused by a different distinct
method in older versions of Rails:
https://apidock.com/rails/ActiveRecord/ConnectionAdapters/SchemaStatements/distinct
Anyway, this PR shouldn't hurt, and we should look into our other uses of distinct
throughout the code.
Here is a query for
This is an insightful comment. I didn't have a careful look at the removed |
…odon#14662) DISTINCT clause removes duplicated records according to all the selected attributes. In reality, it can remove duplicated records only looking at statuses.id, but the clause confuses the query planner and yields insufficient performance. The behavior is also problematic if the scope produced by HashQueryService is used to query columns without id (using pluck method, for example). The scope is expected to contain unique statuses, but the uniquness will be evaluated with some arbitrary columns other than id. GROUP BY clause resolves those problem by explicitly specifying the column to take into account for the record distinction. A workaround for the problem of DISTINCT clause in Api::V1::Timelines::TagController is no longer necessary and removed.
…odon#14662) DISTINCT clause removes duplicated records according to all the selected attributes. In reality, it can remove duplicated records only looking at statuses.id, but the clause confuses the query planner and yields insufficient performance. The behavior is also problematic if the scope produced by HashQueryService is used to query columns without id (using pluck method, for example). The scope is expected to contain unique statuses, but the uniquness will be evaluated with some arbitrary columns other than id. GROUP BY clause resolves those problem by explicitly specifying the column to take into account for the record distinction. A workaround for the problem of DISTINCT clause in Api::V1::Timelines::TagController is no longer necessary and removed.
Hi, it's been a while since I made a pull request the last time.
Pawoo is finally updated to 2.9.4 today and we are appreciating lots of improvements. I want to contribute a change made for Pawoo in the process of updating the instance.
The below is the description of this change which is also in the commit message:
DISTINCT clause removes duplicated records according to all the selected attributes. In reality, it can remove duplicated records only looking at
statuses.id
, but the clause confuses the query planner and yields insufficient performance.The behavior is also problematic if the scope produced by
HashQueryService
is used to query columns without id (using pluck method, for example). The scope is expected to contain unique statuses, but the uniquness will be evaluated with some arbitrary columns other than id.GROUP BY clause resolves those problem by explicitly specifying the column to take into account for the record distinction.
A workaround for the problem of DISTINCT clause in
Api::V1::Timelines::TagController
is no longer necessary and removed.