Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise details handling for GraphQL #3134

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

brucebolt
Copy link
Member

@brucebolt brucebolt commented Feb 11, 2025

This makes a number of performance improvements to processing of details for GraphQL queries. When we did profiling, we found DetailsPresenter is home to some of the most invoked and slowest methods within our own code when serving GraphQL responses.

The change in response time is only marginal for the document types we have already migrated to GraphQL, but has potential for larger improvements when we migrate documents containing a large amount of data in details, particularly those with extensive nesting.

Change 1: Only parse details that are requested

At the moment, we are putting the entire contents of the details field through the DetailsPresenter, which is sub-optimal as we are parsing values that clients are not requesting.

By using lookaheads, we can determine which fields from details have been requested and only parse those.

In a previous prototype, we had only transformed the govspeak in the body field. However some schemas permit mixed govspeak/HTML content in other details fields, or even nested within other fields. We therefore need to recursively parse all items within details, as is already done using the DetailsPresenter, to ensure no raw govspeak to presented to the client.

Note: the code here could've been simpler (i.e. not duplicate the object, just slice the details hash) but the ContentEmbedPresenter requires a full Edition object.

Change 2: Do not loop through details unless embed links exist

At the moment, we are looping through each item in details to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document.

We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.

Change 3: Reduce lines of code executed in DetailsPresenter

This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present.

Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given.

Change 4: Remove useless memoization

Everywhere the details method is being called is not re-computing the response, therefore the memoization is pointless.

Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.

Trello card

@brucebolt brucebolt force-pushed the optimise-details-graphql branch from 1d713b1 to e939446 Compare February 11, 2025 17:09
@brucebolt brucebolt changed the title Only parse details fields that are requested Optimise details handling for GraphQL Feb 11, 2025
@brucebolt brucebolt force-pushed the optimise-details-graphql branch from 802c2ef to 5490a31 Compare February 12, 2025 08:40
In the `EditionType` specs, we are currently not requesting any fields
from `details`. This works at the moment, since the `EditionType`
processes the entire content of the `details` hash before filtering out
those selected by the client.

In a future commit, we will only process those the client has selected.
Therefore updating these tests to specify which fields should be
obtained from `details`.

The resolution context has been removed from the `includes` line, since
that version of `run_graphql_field` does not support `lookahead` as an
argument [1].

1: https://github.com/rmosolgo/graphql-ruby/blob/ec47e903e619923715de225c36fafeb28620a91e/lib/graphql/testing/helpers.rb#L125
@brucebolt brucebolt force-pushed the optimise-details-graphql branch from 5490a31 to 779ff12 Compare February 12, 2025 08:42
At the moment, we are putting the entire contents of the `details` field
through the `DetailsPresenter`, which is sub-optimal as we are parsing
values that clients are not requesting.

By using lookaheads, we can determine which fields from `details` have
been requested and only parse those.

In a previous prototype, we had only transformed the govspeak in the
`body` field. However the schemas can permit mixed govspeak/HTML content
in any details field. We therefore need to parse all items within
`details`, as is already done using the `DetailsPresenter`.

This also allows removal of the optimisations for `change_history` added
in 2caf00e since we can now filter the
details to only include that field when requested.

Note: the code here could've been simpler (i.e. not duplicate the
object, just slice the details hash) but the `ContentEmbedPresenter`
requires a full `Edition` object.
@brucebolt brucebolt force-pushed the optimise-details-graphql branch 6 times, most recently from e9af6f9 to a50bbc9 Compare February 12, 2025 14:23
At the moment, we are looping through each item in `details` to look for
any embedded content references. This is sub-optimal as we are searching
through content for a tag that we know won't be there, if there is
nothing to embed in the document.

We should skip doing this if there is nothing that could possibly be
embedded, which we know by looking in the document's links.
There is no schema that permits this format for specifying the content
is govspeak (all schemas require this in an array), so the test can be
removed.

This behaviour is confired by ADR-003 [1].

Removing this test (and support for multi-part content to not be
included in an array)  allows us to simplify the `DetailsPresenter`, as
we will only ever be dealing with strings or arrays of hashes.

1: https://github.com/alphagov/publishing-api/blob/main/docs/arch/adr-003-representation-for-multiple-content-types.md
This presenter currently performs some of the same code in different
methods that determine the type of content in the field, and looks for
content types even when we know there won't be any present.

Therefore refactoring this presenter to perform as little code execution
as possible to order to determine the correct outcome for the content
type given.

This performance improvement is particularly important for GraphQL, as
we will be converting govspeak to HTML at render-time.
@brucebolt brucebolt force-pushed the optimise-details-graphql branch from a50bbc9 to 216ad20 Compare February 12, 2025 15:21
Everywhere the `details` method is being called is not re-computing the
response, therefore the memoization is pointless.

Removing it here, to avoid the unnecessary compute needed to check
whether the instance variable has already been set.
@brucebolt brucebolt force-pushed the optimise-details-graphql branch from b2f757d to 8b840b7 Compare February 12, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant