Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inefficient queries in legacy search API #1878

Merged
merged 1 commit into from
Jun 4, 2019

Conversation

cutwater
Copy link
Collaborator

@cutwater cutwater commented Jun 3, 2019

Improve legacy search API performance by replacing
filters against many-to-many relationships from
SELECT DISTINCT and JOIN query
to WHERE .. IN query.

Signed-off-by: Alexander Saprykin [email protected]

Fixes: #1876

Improve legacy search API performance by replacing
filters against many-to-many relationships from
`SELECT DISTINCT` and `JOIN` query
to `WHERE .. IN` query.

Signed-off-by: Alexander Saprykin <[email protected]>
platforms__in=models.Platform.objects.filter(name__in=platforms))
platforms_qs = models.Content.objects.only('pk').filter(
platforms__name__in=platforms)
return queryset.filter(pk__in=platforms_qs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is writing it this appreciably different in the resulting SQL from

return queryset.filter(platforms__name__in=platforms)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call queryset.filter(platforms__name__in=platforms) generates SQL JOIN query, for example:

SELECT ...
FROM main_content
JOIN main_content_platforms ON ...
JOIN main_platforms ON ...
WHERE main_platforms.name IN (...)

Such JOIN query on many-to-many relationship creates row duplicates, that then can be eliminated by DISTINCT \ DISTINCT ON. Applying dumb .distinct() call on a queryset leads to generating SELECT DISTINCT over all fields in query, which for this select query can be large (considering selecting fields from related tables).

One possible solution is to select records and join with subquery that returns unique content ids, but Django ORM doesn't provide an obvious interface for custom joins.

Another solution, we use in search, is making WHERE 'id' IN (SELECT ...) query, which has similar performance to JOIN with unique IDs.

@cutwater cutwater merged commit 50e123c into ansible:devel Jun 4, 2019
@cutwater cutwater deleted the fix/legacy-search branch June 4, 2019 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Legacy search API performance deficiencies
3 participants