Optimise `details` handling for GraphQL #3134

brucebolt · 2025-02-11T17:00:13Z

This makes a number of performance improvements to processing of details for GraphQL queries. When we did profiling, we found DetailsPresenter is home to some of the most invoked and slowest methods within our own code when serving GraphQL responses.

The change in response time is only marginal for the document types we have already migrated to GraphQL, but has potential for larger improvements when we migrate documents containing a large amount of data in details, particularly those with extensive nesting.

Change 1: Only parse details that are requested

At the moment, we are putting the entire contents of the details field through the DetailsPresenter, which is sub-optimal as we are parsing values that clients are not requesting.

By using lookaheads, we can determine which fields from details have been requested and only parse those.

In a previous prototype, we had only transformed the govspeak in the body field. However some schemas permit mixed govspeak/HTML content in other details fields, or even nested within other fields. We therefore need to recursively parse all items within details, as is already done using the DetailsPresenter, to ensure no raw govspeak to presented to the client.

Note: the code here could've been simpler (i.e. not duplicate the object, just slice the details hash) but the ContentEmbedPresenter requires a full Edition object.

Change 2: Do not loop through details unless embed links exist

At the moment, we are looping through each item in details to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document.

We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.

Change 3: Reduce lines of code executed in `DetailsPresenter`

This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present.

Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given.

Change 4: Remove useless memoization

Everywhere the details method is being called is not re-computing the response, therefore the memoization is pointless.

Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.

Trello card

In the `EditionType` specs, we are currently not requesting any fields from `details`. This works at the moment, since the `EditionType` processes the entire content of the `details` hash before filtering out those selected by the client. In a future commit, we will only process those the client has selected. Therefore updating these tests to specify which fields should be obtained from `details`. The resolution context has been removed from the `includes` line, since that version of `run_graphql_field` does not support `lookahead` as an argument [1]. 1: https://github.com/rmosolgo/graphql-ruby/blob/ec47e903e619923715de225c36fafeb28620a91e/lib/graphql/testing/helpers.rb#L125

At the moment, we are putting the entire contents of the `details` field through the `DetailsPresenter`, which is sub-optimal as we are parsing values that clients are not requesting. By using lookaheads, we can determine which fields from `details` have been requested and only parse those. In a previous prototype, we had only transformed the govspeak in the `body` field. However the schemas can permit mixed govspeak/HTML content in any details field. We therefore need to parse all items within `details`, as is already done using the `DetailsPresenter`. This also allows removal of the optimisations for `change_history` added in 2caf00e since we can now filter the details to only include that field when requested. Note: the code here could've been simpler (i.e. not duplicate the object, just slice the details hash) but the `ContentEmbedPresenter` requires a full `Edition` object.

At the moment, we are looping through each item in `details` to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document. We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.

There is no schema that permits this format for specifying the content is govspeak (all schemas require this in an array), so the test can be removed. This behaviour is confired by ADR-003 [1]. Removing this test (and support for multi-part content to not be included in an array) allows us to simplify the `DetailsPresenter`, as we will only ever be dealing with strings or arrays of hashes. 1: https://github.com/alphagov/publishing-api/blob/main/docs/arch/adr-003-representation-for-multiple-content-types.md

This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present. Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given. This performance improvement is particularly important for GraphQL, as we will be converting govspeak to HTML at render-time.

Everywhere the `details` method is being called is not re-computing the response, therefore the memoization is pointless. Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.

brucebolt force-pushed the optimise-details-graphql branch from 1d713b1 to e939446 Compare February 11, 2025 17:09

brucebolt changed the title ~~Only parse details fields that are requested~~ Optimise details handling for GraphQL Feb 11, 2025

brucebolt force-pushed the optimise-details-graphql branch from 802c2ef to 5490a31 Compare February 12, 2025 08:40

brucebolt force-pushed the optimise-details-graphql branch from 5490a31 to 779ff12 Compare February 12, 2025 08:42

brucebolt force-pushed the optimise-details-graphql branch 6 times, most recently from e9af6f9 to a50bbc9 Compare February 12, 2025 14:23

brucebolt added 3 commits February 12, 2025 15:21

brucebolt force-pushed the optimise-details-graphql branch from a50bbc9 to 216ad20 Compare February 12, 2025 15:21

Remove useless memoization

8b840b7

Everywhere the `details` method is being called is not re-computing the response, therefore the memoization is pointless. Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.

brucebolt force-pushed the optimise-details-graphql branch from b2f757d to 8b840b7 Compare February 12, 2025 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise `details` handling for GraphQL #3134

Optimise `details` handling for GraphQL #3134

brucebolt commented Feb 11, 2025 •

edited

Loading

Optimise details handling for GraphQL #3134

Are you sure you want to change the base?

Optimise details handling for GraphQL #3134

Conversation

brucebolt commented Feb 11, 2025 • edited Loading

Change 1: Only parse details that are requested

Change 2: Do not loop through details unless embed links exist

Change 3: Reduce lines of code executed in DetailsPresenter

Change 4: Remove useless memoization

Optimise `details` handling for GraphQL #3134

Optimise `details` handling for GraphQL #3134

brucebolt commented Feb 11, 2025 •

edited

Loading

Change 3: Reduce lines of code executed in `DetailsPresenter`