-
-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement for finding if a slug is already used or not #497
Conversation
I just ran the benchmarks. Before:
And after:
So for trivial-sized tables it does appear to be a bit worse, but I'm willing to take that if the performance is better in something more like real life. Let me look this over a bit more and I'll merge it soon. Thanks! |
In case anybody is interested I benchmarked this on Postgres with a larger data set - the default benchmark uses only 100 records, so I bumped that up to 5000. Here are the results: Without patch:
With patch:
So the performance boost is moderate to significant everywhere (be sure to look at the "real" column). Merging this now and will do a new release today. Thanks again @mhodgson! |
Performance improvement for finding if a slug is already used or not
@norman happy to help! Thanks for friendly_id! |
What a superb pull request. Nice work @mhodgson 👍 |
This commit drastically improves the performance of this lookup on non-trivial sized tables. The original query took the following form (for a table named 'titles' with a slugged field called 'permalink'):
When explained, you can see that this query is doing TWO sequential scans, one on the 'titles' table and one on the 'friendly_id_slugs' table:
This is really bad. For example, our friendly_id_slugs table has 900k records and our titles table has 500k records. This query takes over 4 SECONDS.
This query should be able to take advantage of the indexes on both the friendly_id_slugs table and the titles table (which has an index on permalink). Unfortunately the Postgres query optimizer is not taking advantage of the indexes on either table.
The attached commit splits this method into two simpler queries that can take advantage of the indexes. On the example above this amounts to a performance improvement of over 1000x (3ms vs 4028ms).
It is possible that this refactor will actually decrease performance in apps with very small amounts of data, but anyone with a significant amount of data should see serious performance improvements.