Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for finding if a slug is already used or not #497

Merged
merged 2 commits into from
Dec 10, 2013
Merged

Performance improvement for finding if a slug is already used or not #497

merged 2 commits into from
Dec 10, 2013

Conversation

mhodgson
Copy link
Contributor

@mhodgson mhodgson commented Dec 9, 2013

This commit drastically improves the performance of this lookup on non-trivial sized tables. The original query took the following form (for a table named 'titles' with a slugged field called 'permalink'):

SELECT 1 AS one FROM "titles" 
  INNER JOIN "friendly_id_slugs" 
    ON "friendly_id_slugs"."sluggable_id" = "titles"."id" 
    AND "friendly_id_slugs"."sluggable_type" = 'Title' 
  WHERE (
    "titles"."permalink" = 'biology' OR ("friendly_id_slugs"."sluggable_type" = 'Title' AND "friendly_id_slugs"."slug" = 'biology')
  ) LIMIT 1;

When explained, you can see that this query is doing TWO sequential scans, one on the 'titles' table and one on the 'friendly_id_slugs' table:

                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=39606.33..67567.51 rows=1 width=0)
   ->  Hash Join  (cost=39606.33..123489.87 rows=3 width=0)
         Hash Cond: (friendly_id_slugs.sluggable_id = titles.id)
         Join Filter: ((titles.permalink = 'biology'::text) OR (((friendly_id_slugs.sluggable_type)::text = 'Title'::text) AND (friendly_id_slugs.slug = 'biology'::text)))
         ->  Seq Scan on friendly_id_slugs  (cost=0.00..31217.93 rows=859256 width=76)
               Filter: ((sluggable_type)::text = 'Title'::text)
         ->  Hash  (cost=27845.48..27845.48 rows=449348 width=87)
               ->  Seq Scan on titles  (cost=0.00..27845.48 rows=449348 width=87)

This is really bad. For example, our friendly_id_slugs table has 900k records and our titles table has 500k records. This query takes over 4 SECONDS.

This query should be able to take advantage of the indexes on both the friendly_id_slugs table and the titles table (which has an index on permalink). Unfortunately the Postgres query optimizer is not taking advantage of the indexes on either table.

The attached commit splits this method into two simpler queries that can take advantage of the indexes. On the example above this amounts to a performance improvement of over 1000x (3ms vs 4028ms).

It is possible that this refactor will actually decrease performance in apps with very small amounts of data, but anyone with a significant amount of data should see serious performance improvements.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 31b573f on GoBoundless:master into 0732bdf on norman:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling caaa0e2 on GoBoundless:master into 0732bdf on norman:master.

@norman
Copy link
Owner

norman commented Dec 9, 2013

I just ran the benchmarks. Before:

Using ruby 2.0.0 AR 4.0.1 with sqlite3 (in-memory)
Rehearsal --------------------------------------------------------------------------------
find (without FriendlyId)                      0.300000   0.010000   0.310000 (  0.302228)
find (in-table slug)                           0.790000   0.000000   0.790000 (  0.792201)
find (in-table slug; using finders module)     0.570000   0.000000   0.570000 (  0.566349)
find (external slug)                           1.480000   0.030000   1.510000 (  1.523051)
insert (without FriendlyId)                    0.760000   0.010000   0.770000 (  0.770953)
insert (in-table-slug)                         2.810000   0.010000   2.820000 (  2.840422)
insert (in-table-slug; using finders module)   0.830000   0.010000   0.840000 (  0.845272)
insert (external slug)                         7.710000   0.030000   7.740000 (  7.771161)
---------------------------------------------------------------------- total: 15.350000sec

                                                   user     system      total        real
find (without FriendlyId)                      0.260000   0.000000   0.260000 (  0.259117)
find (in-table slug)                           0.800000   0.000000   0.800000 (  0.804663)
find (in-table slug; using finders module)     0.400000   0.000000   0.400000 (  0.398897)
find (external slug)                           1.500000   0.040000   1.540000 (  1.543959)
insert (without FriendlyId)                    0.760000   0.000000   0.760000 (  0.767646)
insert (in-table-slug)                         2.820000   0.010000   2.830000 (  2.855931)
insert (in-table-slug; using finders module)   0.800000   0.010000   0.810000 (  0.801796)
insert (external slug)                         7.870000   0.020000   7.890000 (  7.924459)

And after:

Using ruby 2.0.0 AR 4.0.1 with sqlite3 (in-memory)
Rehearsal --------------------------------------------------------------------------------
find (without FriendlyId)                      0.260000   0.000000   0.260000 (  0.258390)
find (in-table slug)                           0.810000   0.000000   0.810000 (  0.814801)
find (in-table slug; using finders module)     0.540000   0.000000   0.540000 (  0.544505)
find (external slug)                           1.520000   0.040000   1.560000 (  1.554933)
insert (without FriendlyId)                    0.810000   0.000000   0.810000 (  0.820789)
insert (in-table-slug)                         2.820000   0.010000   2.830000 (  2.839953)
insert (in-table-slug; using finders module)   0.840000   0.010000   0.850000 (  0.838540)
insert (external slug)                         8.930000   0.030000   8.960000 (  9.006793)
---------------------------------------------------------------------- total: 16.620000sec

                                                   user     system      total        real
find (without FriendlyId)                      0.260000   0.000000   0.260000 (  0.261005)
find (in-table slug)                           0.790000   0.000000   0.790000 (  0.794214)
find (in-table slug; using finders module)     0.390000   0.000000   0.390000 (  0.397870)
find (external slug)                           1.590000   0.040000   1.630000 (  1.694329)
insert (without FriendlyId)                    0.770000   0.000000   0.770000 (  0.800720)
insert (in-table-slug)                         2.880000   0.010000   2.890000 (  2.912652)
insert (in-table-slug; using finders module)   0.810000   0.000000   0.810000 (  0.811232)
insert (external slug)                         9.300000   0.030000   9.330000 (  9.369289)

So for trivial-sized tables it does appear to be a bit worse, but I'm willing to take that if the performance is better in something more like real life. Let me look this over a bit more and I'll merge it soon. Thanks!

@norman
Copy link
Owner

norman commented Dec 10, 2013

In case anybody is interested I benchmarked this on Postgres with a larger data set - the default benchmark uses only 100 records, so I bumped that up to 5000. Here are the results:

Without patch:

                                                   user     system      total        real
find (without FriendlyId)                      0.320000   0.030000   0.350000 (  0.453029)
find (in-table slug)                           0.960000   0.050000   1.010000 (  1.313463)
find (in-table slug; using finders module)     0.620000   0.050000   0.670000 (  2.481130)
find (external slug)                           1.800000   0.070000   1.870000 (  2.626265)
insert (without FriendlyId)                    0.850000   0.100000   0.950000 (  1.172414)
insert (in-table-slug)                         3.340000   0.180000   3.520000 (  4.279130)
insert (in-table-slug; using finders module)   0.930000   0.100000   1.030000 (  1.298404)
insert (external slug)                         9.300000   0.460000   9.760000 ( 18.516446)

With patch:

                                                   user     system      total        real
find (without FriendlyId)                      0.320000   0.030000   0.350000 (  0.446131)
find (in-table slug)                           0.900000   0.040000   0.940000 (  1.220324)
find (in-table slug; using finders module)     0.440000   0.040000   0.480000 (  1.442221)
find (external slug)                           1.640000   0.050000   1.690000 (  2.270874)
insert (without FriendlyId)                    0.860000   0.090000   0.950000 (  1.203990)
insert (in-table-slug)                         3.180000   0.160000   3.340000 (  3.965620)
insert (in-table-slug; using finders module)   0.850000   0.080000   0.930000 (  1.170974)
insert (external slug)                        10.060000   0.450000  10.510000 ( 13.183820)

So the performance boost is moderate to significant everywhere (be sure to look at the "real" column). Merging this now and will do a new release today. Thanks again @mhodgson!

norman added a commit that referenced this pull request Dec 10, 2013
Performance improvement for finding if a slug is already used or not
@norman norman merged commit 826d72d into norman:master Dec 10, 2013
@mhodgson
Copy link
Contributor Author

@norman happy to help! Thanks for friendly_id!

@parndt
Copy link
Collaborator

parndt commented Dec 10, 2013

What a superb pull request. Nice work @mhodgson 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants