Performance improvement for finding if a slug is already used or not #497

mhodgson · 2013-12-09T22:33:07Z

This commit drastically improves the performance of this lookup on non-trivial sized tables. The original query took the following form (for a table named 'titles' with a slugged field called 'permalink'):

SELECT 1 AS one FROM "titles" 
  INNER JOIN "friendly_id_slugs" 
    ON "friendly_id_slugs"."sluggable_id" = "titles"."id" 
    AND "friendly_id_slugs"."sluggable_type" = 'Title' 
  WHERE (
    "titles"."permalink" = 'biology' OR ("friendly_id_slugs"."sluggable_type" = 'Title' AND "friendly_id_slugs"."slug" = 'biology')
  ) LIMIT 1;

When explained, you can see that this query is doing TWO sequential scans, one on the 'titles' table and one on the 'friendly_id_slugs' table:

                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=39606.33..67567.51 rows=1 width=0)
   ->  Hash Join  (cost=39606.33..123489.87 rows=3 width=0)
         Hash Cond: (friendly_id_slugs.sluggable_id = titles.id)
         Join Filter: ((titles.permalink = 'biology'::text) OR (((friendly_id_slugs.sluggable_type)::text = 'Title'::text) AND (friendly_id_slugs.slug = 'biology'::text)))
         ->  Seq Scan on friendly_id_slugs  (cost=0.00..31217.93 rows=859256 width=76)
               Filter: ((sluggable_type)::text = 'Title'::text)
         ->  Hash  (cost=27845.48..27845.48 rows=449348 width=87)
               ->  Seq Scan on titles  (cost=0.00..27845.48 rows=449348 width=87)

This is really bad. For example, our friendly_id_slugs table has 900k records and our titles table has 500k records. This query takes over 4 SECONDS.

This query should be able to take advantage of the indexes on both the friendly_id_slugs table and the titles table (which has an index on permalink). Unfortunately the Postgres query optimizer is not taking advantage of the indexes on either table.

The attached commit splits this method into two simpler queries that can take advantage of the indexes. On the example above this amounts to a performance improvement of over 1000x (3ms vs 4028ms).

It is possible that this refactor will actually decrease performance in apps with very small amounts of data, but anyone with a significant amount of data should see serious performance improvements.

coveralls · 2013-12-09T22:34:57Z

Coverage remained the same when pulling 31b573f on GoBoundless:master into 0732bdf on norman:master.

coveralls · 2013-12-09T22:50:06Z

Coverage remained the same when pulling caaa0e2 on GoBoundless:master into 0732bdf on norman:master.

norman · 2013-12-09T23:09:16Z

I just ran the benchmarks. Before:

Using ruby 2.0.0 AR 4.0.1 with sqlite3 (in-memory)
Rehearsal --------------------------------------------------------------------------------
find (without FriendlyId)                      0.300000   0.010000   0.310000 (  0.302228)
find (in-table slug)                           0.790000   0.000000   0.790000 (  0.792201)
find (in-table slug; using finders module)     0.570000   0.000000   0.570000 (  0.566349)
find (external slug)                           1.480000   0.030000   1.510000 (  1.523051)
insert (without FriendlyId)                    0.760000   0.010000   0.770000 (  0.770953)
insert (in-table-slug)                         2.810000   0.010000   2.820000 (  2.840422)
insert (in-table-slug; using finders module)   0.830000   0.010000   0.840000 (  0.845272)
insert (external slug)                         7.710000   0.030000   7.740000 (  7.771161)
---------------------------------------------------------------------- total: 15.350000sec

                                                   user     system      total        real
find (without FriendlyId)                      0.260000   0.000000   0.260000 (  0.259117)
find (in-table slug)                           0.800000   0.000000   0.800000 (  0.804663)
find (in-table slug; using finders module)     0.400000   0.000000   0.400000 (  0.398897)
find (external slug)                           1.500000   0.040000   1.540000 (  1.543959)
insert (without FriendlyId)                    0.760000   0.000000   0.760000 (  0.767646)
insert (in-table-slug)                         2.820000   0.010000   2.830000 (  2.855931)
insert (in-table-slug; using finders module)   0.800000   0.010000   0.810000 (  0.801796)
insert (external slug)                         7.870000   0.020000   7.890000 (  7.924459)

And after:

Using ruby 2.0.0 AR 4.0.1 with sqlite3 (in-memory)
Rehearsal --------------------------------------------------------------------------------
find (without FriendlyId)                      0.260000   0.000000   0.260000 (  0.258390)
find (in-table slug)                           0.810000   0.000000   0.810000 (  0.814801)
find (in-table slug; using finders module)     0.540000   0.000000   0.540000 (  0.544505)
find (external slug)                           1.520000   0.040000   1.560000 (  1.554933)
insert (without FriendlyId)                    0.810000   0.000000   0.810000 (  0.820789)
insert (in-table-slug)                         2.820000   0.010000   2.830000 (  2.839953)
insert (in-table-slug; using finders module)   0.840000   0.010000   0.850000 (  0.838540)
insert (external slug)                         8.930000   0.030000   8.960000 (  9.006793)
---------------------------------------------------------------------- total: 16.620000sec

                                                   user     system      total        real
find (without FriendlyId)                      0.260000   0.000000   0.260000 (  0.261005)
find (in-table slug)                           0.790000   0.000000   0.790000 (  0.794214)
find (in-table slug; using finders module)     0.390000   0.000000   0.390000 (  0.397870)
find (external slug)                           1.590000   0.040000   1.630000 (  1.694329)
insert (without FriendlyId)                    0.770000   0.000000   0.770000 (  0.800720)
insert (in-table-slug)                         2.880000   0.010000   2.890000 (  2.912652)
insert (in-table-slug; using finders module)   0.810000   0.000000   0.810000 (  0.811232)
insert (external slug)                         9.300000   0.030000   9.330000 (  9.369289)

So for trivial-sized tables it does appear to be a bit worse, but I'm willing to take that if the performance is better in something more like real life. Let me look this over a bit more and I'll merge it soon. Thanks!

norman · 2013-12-10T13:53:56Z

In case anybody is interested I benchmarked this on Postgres with a larger data set - the default benchmark uses only 100 records, so I bumped that up to 5000. Here are the results:

Without patch:

                                                   user     system      total        real
find (without FriendlyId)                      0.320000   0.030000   0.350000 (  0.453029)
find (in-table slug)                           0.960000   0.050000   1.010000 (  1.313463)
find (in-table slug; using finders module)     0.620000   0.050000   0.670000 (  2.481130)
find (external slug)                           1.800000   0.070000   1.870000 (  2.626265)
insert (without FriendlyId)                    0.850000   0.100000   0.950000 (  1.172414)
insert (in-table-slug)                         3.340000   0.180000   3.520000 (  4.279130)
insert (in-table-slug; using finders module)   0.930000   0.100000   1.030000 (  1.298404)
insert (external slug)                         9.300000   0.460000   9.760000 ( 18.516446)

With patch:

                                                   user     system      total        real
find (without FriendlyId)                      0.320000   0.030000   0.350000 (  0.446131)
find (in-table slug)                           0.900000   0.040000   0.940000 (  1.220324)
find (in-table slug; using finders module)     0.440000   0.040000   0.480000 (  1.442221)
find (external slug)                           1.640000   0.050000   1.690000 (  2.270874)
insert (without FriendlyId)                    0.860000   0.090000   0.950000 (  1.203990)
insert (in-table-slug)                         3.180000   0.160000   3.340000 (  3.965620)
insert (in-table-slug; using finders module)   0.850000   0.080000   0.930000 (  1.170974)
insert (external slug)                        10.060000   0.450000  10.510000 ( 13.183820)

So the performance boost is moderate to significant everywhere (be sure to look at the "real" column). Merging this now and will do a new release today. Thanks again @mhodgson!

Performance improvement for finding if a slug is already used or not

mhodgson · 2013-12-10T14:30:00Z

@norman happy to help! Thanks for friendly_id!

parndt · 2013-12-10T18:59:46Z

What a superb pull request. Nice work @mhodgson 👍

Performance improvement for finding if a slug is already used or not

31b573f

Apparently this join is used other places as well

caaa0e2

norman added a commit that referenced this pull request Dec 10, 2013

Merge pull request #497 from GoBoundless/master

826d72d

Performance improvement for finding if a slug is already used or not

norman merged commit 826d72d into norman:master Dec 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement for finding if a slug is already used or not #497

Performance improvement for finding if a slug is already used or not #497

mhodgson commented Dec 9, 2013

coveralls commented Dec 9, 2013

coveralls commented Dec 9, 2013

norman commented Dec 9, 2013

norman commented Dec 10, 2013

mhodgson commented Dec 10, 2013

parndt commented Dec 10, 2013

Performance improvement for finding if a slug is already used or not #497

Performance improvement for finding if a slug is already used or not #497

Conversation

mhodgson commented Dec 9, 2013

coveralls commented Dec 9, 2013

coveralls commented Dec 9, 2013

norman commented Dec 9, 2013

norman commented Dec 10, 2013

mhodgson commented Dec 10, 2013

parndt commented Dec 10, 2013