-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache DuplicateNameChecker in the OutputContext #2253
Cache DuplicateNameChecker in the OutputContext #2253
Conversation
Seems some tests are failing due to duplicate names? Maybe they are situations where you don't want to share the same property checker? My assumption is that this may due to nested properties (navigation or complex): maybe the same property name exists on the parent entity as well as in a nested object, so a different property checker should be used. Maybe we can make the property checker hiearchical, instead of creating a new checker every time we enter a scope, or having a single checker for the entire response, we can create a new checker per scope but cache it so that the next time we enter the scope for the same nested property, we re-use the same checker? @joaocpaiva had also given an object pool as another suggestion. |
18448b8
to
f97e47c
Compare
Seems like Kennedy was just missing a Reset(). Obviously, to reuse a collection it is important to call Clear() every time we enter a new scope that needs it with an empty state. Reusing single collection per operation, should go a long way in reducing allocations for this stack. |
I updated the code and called |
This PR has Quantification details
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a
What can I do to optimize my changes
How to interpret the change counts in git diff output
Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email) |
@@ -394,7 +398,7 @@ await this.WriteInstanceAnnotationNameAsync(propertyName, annotationName) | |||
await this.valueSerializer.WriteResourceValueAsync(resourceValue, | |||
expectedType, | |||
treatLikeOpenProperty, | |||
this.valueSerializer.CreateDuplicatePropertyNameChecker()).ConfigureAwait(false); | |||
this.valueSerializer.JsonLightOutputContext.DuplicatePropertyNameChecker).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a different level, it could be a "concurrency" problem?
I mean when we write a payload with multiple "levels", for example, Top Resource, Child resource, GrandChild resource...
The PropertyNameChecker is reused in each "level"? If yes, it's a big problem if Top Resource has the same property name with GrandChild resource?
@joaocpaiva @KenitoInc I still think using a shared Consider this example: {
"Id": 1,
"Foo": "Bar",
"Nested": {
"Fizz": "Buzz",
"Foo": "Bar"
},
"Id": 2
} If we use the same duplicate checker without resetting, then it will incorrectly flag the top-level However, if we call Since the tests are all passing, maybe we need more tests to cover these scenarios, or maybe my assumptions about how the duplicate checker is used is false. |
Makes sense @habbes @KenitoInc. We should make sure there is a test for that use case. Yet another alternative would be to check all top level properties, before processing the nested properties, so we could reset is safely at the end of every level? |
Checking top level properties before writing nested properties isn't really an option, as the writer supports streaming the values so the service doesn't have to keep the entire response object in memory. In reply to: 978052709 |
@joaocpaiva @mikepizzo if checking all top-level properties before nested properties is not an option, maybe we could still make significant gains from creating only one duplicate checker per nesting level if we have a response with a lot of entities with nested properties. For example, assuming we're writing a response with 10 entities like: [
{
"Prop1": "value",
"Nested1": {
"N1Prop": "value"
},
"Prop2": "value",
"Nested2": {
"N2Prop": "value",
"N2Nested": { ... }
}
}
...
] In the current implementation, I think we'll allocate 1 checker for the collection + (1 for But we can safely created once instance of Also after processing an entity in the response, each dictionary would have grown to fit the largest number of properties at its nesting level, which means (I think) the dictionaries will probably not be resized after the first entity is written. I assume this is the same improvement we'd get if we used an object pool? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current code seems to change semantics of duplicate name checking. Perhaps there's a better way to improve perf without caching/resetting.
@@ -120,7 +120,10 @@ public void ValidatePropertyOpenForAssociationLink(string propertyName) | |||
/// </summary> | |||
public void Reset() | |||
{ | |||
propertyState.Clear(); | |||
if (propertyState.Count > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there was another reason for adding this if
statement, but Dictionary<TKey, TValue>.Clear
already does an identical count check as an optimization
Is there any way to write a test to ensure that the caching is working correctly? |
Created a new PR #2328 with a different implementation |
Issues
This pull request fixes #2180 .
Description
When serializing a response, we are calling
[Serializer].CreateDuplicatePropertyNameChecker()
multiple times to create an instance of theDuplicatePropertyNameChecker
each time. This causes lots of allocations from initializing and resizing the internalpropertyState
dictionary.In this PR, we create an Instance of the
DuplicatePropertyNameChecker
when we create theOutputContext
. This allows us to re-use theDuplicatePropertyNameChecker
throughout the response serialization process.Checklist (Uncheck if it is not completed)
Additional work necessary
If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.