Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch link expansion queries #3106

Merged
merged 2 commits into from
Feb 3, 2025
Merged

Conversation

richardTowers
Copy link
Contributor

@richardTowers richardTowers commented Jan 29, 2025

Previously LinkedToEditionsSource would make one query per edition (getting linked editions, given a list of link types).

We can do better than this by making it take a pair of edition and link type, and then execute a query like:

SELECT *
INNER JOIN "link_sets" ON "link_sets"."id" = "links"."link_set_id"
INNER JOIN "documents" "target_documents" ON "target_documents"."content_id" = "links"."target_content_id"
INNER JOIN "editions" ON "editions"."content_store" = 'live' AND "editions"."document_id" = "target_documents"."id"
WHERE (
  ("link_sets"."content_id", "links"."link_type")
  IN
  (
    ('614abfd2-a26b-4df9-af6c-179e0fe1d99d','person'),
    ('2314c82e-2860-4cbe-9e5c-5320f64d0ac4','person'),
    ('a1052635-d603-40a1-b3f2-fb751b8ed9f9','person'),
    ...
  )
)

This allows the dataloader to batch up all the editions and requested link types at each level of depth before sending one big query, which dramatically reduces the number of SQL queries needed.

For the /government/prime-minister page on @brucebolt's machine, this reduces ActiveRecord time from ~100ms to ~20ms.

ActiveRecord doesn't seem to support the kind of tuple-inclusion query I'm making here. It does if all the columns are on the same table, but we need columns in different tables, so I've had to drop down to using Arel.

Trello card

Base automatically changed from add-graphql-optimisation-index to main January 30, 2025 09:57
@brucebolt brucebolt force-pushed the towers/batch-graphql-edition-queries branch 3 times, most recently from 3de40b0 to 179143e Compare January 30, 2025 14:17
@brucebolt brucebolt changed the base branch from main to reduce-graphql-queries January 30, 2025 14:49
@brucebolt brucebolt force-pushed the towers/batch-graphql-edition-queries branch from 179143e to fc6d38f Compare January 30, 2025 14:49
@brucebolt
Copy link
Member

brucebolt commented Jan 30, 2025

I tried combining the link set and edition links into a single query .... but ended up making 19 SQL queries instead of 15! I don't think the performance improvement will be significant by combining the two, so have left them as separate queries.

Base automatically changed from reduce-graphql-queries to main January 30, 2025 15:03
@brucebolt brucebolt marked this pull request as ready for review January 30, 2025 15:04
@brucebolt brucebolt force-pushed the towers/batch-graphql-edition-queries branch from fc6d38f to de1628b Compare January 30, 2025 15:05
@brucebolt brucebolt self-assigned this Jan 30, 2025
Copy link
Member

@yndajas yndajas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had time to finish looking at this, but no issues so far!

spec/graphql/sources/linked_to_editions_source_spec.rb Outdated Show resolved Hide resolved
app/graphql/sources/linked_to_editions_source.rb Outdated Show resolved Hide resolved
brucebolt and others added 2 commits February 3, 2025 10:14
We are filtering the links by the content store of the target document's
edition, but no test was included for this.

Adding a test prior to making changes that could affect this filtering.
Previously LinkedToEditionsSource would make one query per edition
(getting linked editions, given a list of link types).

This change makes one query per parent edition, by getting the
dataloader to batch queries across multiple editions and link types.

For the prime minister page, the number of database queries reduces from
93 to 15, and the ActiveRecord execution time decreases from ~100ms to
~20ms, when run locally.

Co-authored-by: Richard Towers <[email protected]>
@brucebolt brucebolt force-pushed the towers/batch-graphql-edition-queries branch from de1628b to cff1b0c Compare February 3, 2025 10:16
@@ -10,7 +10,7 @@
create(:link, link_set: link_set, target_content_id: target_edition_3.content_id, link_type: "test_link")

GraphQL::Dataloader.with_dataloading do |dataloader|
request = dataloader.with(described_class, parent_object: source_edition).request("test_link")
request = dataloader.with(described_class, content_store: source_edition.content_store).request([source_edition, "test_link"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: are .request in this spec and .load in BaseObject both ending up at the fetch method in LinkedToEditionsSource? 🤯

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to be, but I'm not sure why. The same is done is the other dataloader tests too.

@brucebolt brucebolt merged commit bc1bbdd into main Feb 3, 2025
12 checks passed
@brucebolt brucebolt deleted the towers/batch-graphql-edition-queries branch February 3, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants