-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise details
handling for GraphQL
#3134
Draft
brucebolt
wants to merge
6
commits into
main
Choose a base branch
from
optimise-details-graphql
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1d713b1
to
e939446
Compare
details
handling for GraphQL
802c2ef
to
5490a31
Compare
In the `EditionType` specs, we are currently not requesting any fields from `details`. This works at the moment, since the `EditionType` processes the entire content of the `details` hash before filtering out those selected by the client. In a future commit, we will only process those the client has selected. Therefore updating these tests to specify which fields should be obtained from `details`. The resolution context has been removed from the `includes` line, since that version of `run_graphql_field` does not support `lookahead` as an argument [1]. 1: https://github.com/rmosolgo/graphql-ruby/blob/ec47e903e619923715de225c36fafeb28620a91e/lib/graphql/testing/helpers.rb#L125
5490a31
to
779ff12
Compare
At the moment, we are putting the entire contents of the `details` field through the `DetailsPresenter`, which is sub-optimal as we are parsing values that clients are not requesting. By using lookaheads, we can determine which fields from `details` have been requested and only parse those. In a previous prototype, we had only transformed the govspeak in the `body` field. However the schemas can permit mixed govspeak/HTML content in any details field. We therefore need to parse all items within `details`, as is already done using the `DetailsPresenter`. This also allows removal of the optimisations for `change_history` added in 2caf00e since we can now filter the details to only include that field when requested. Note: the code here could've been simpler (i.e. not duplicate the object, just slice the details hash) but the `ContentEmbedPresenter` requires a full `Edition` object.
e9af6f9
to
a50bbc9
Compare
At the moment, we are looping through each item in `details` to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document. We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.
There is no schema that permits this format for specifying the content is govspeak (all schemas require this in an array), so the test can be removed. This behaviour is confired by ADR-003 [1]. Removing this test (and support for multi-part content to not be included in an array) allows us to simplify the `DetailsPresenter`, as we will only ever be dealing with strings or arrays of hashes. 1: https://github.com/alphagov/publishing-api/blob/main/docs/arch/adr-003-representation-for-multiple-content-types.md
This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present. Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given. This performance improvement is particularly important for GraphQL, as we will be converting govspeak to HTML at render-time.
a50bbc9
to
216ad20
Compare
Everywhere the `details` method is being called is not re-computing the response, therefore the memoization is pointless. Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.
b2f757d
to
8b840b7
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This makes a number of performance improvements to processing of
details
for GraphQL queries. When we did profiling, we foundDetailsPresenter
is home to some of the most invoked and slowest methods within our own code when serving GraphQL responses.The change in response time is only marginal for the document types we have already migrated to GraphQL, but has potential for larger improvements when we migrate documents containing a large amount of data in
details
, particularly those with extensive nesting.Change 1: Only parse details that are requested
At the moment, we are putting the entire contents of the
details
field through theDetailsPresenter
, which is sub-optimal as we are parsing values that clients are not requesting.By using lookaheads, we can determine which fields from
details
have been requested and only parse those.In a previous prototype, we had only transformed the govspeak in the
body
field. However some schemas permit mixed govspeak/HTML content in other details fields, or even nested within other fields. We therefore need to recursively parse all items withindetails
, as is already done using theDetailsPresenter
, to ensure no raw govspeak to presented to the client.Note: the code here could've been simpler (i.e. not duplicate the object, just slice the details hash) but the
ContentEmbedPresenter
requires a fullEdition
object.Change 2: Do not loop through details unless embed links exist
At the moment, we are looping through each item in
details
to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document.We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.
Change 3: Reduce lines of code executed in
DetailsPresenter
This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present.
Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given.
Change 4: Remove useless memoization
Everywhere the
details
method is being called is not re-computing the response, therefore the memoization is pointless.Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.
Trello card