Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lazy cloning and apply the optimization in more cases #2921

Merged
merged 4 commits into from
Jan 6, 2023

Conversation

jackkoenig
Copy link
Contributor

Follow on to #2919. Fixes a couple of minor bugs and enhances the implementation from #2611.

This fixes 2 bugs:

  • Lazy clone checking for external references must include Records
  • Lazy clone checking for Records must be recursive.

The simple fix for the 2nd bug has a large negative impact on the utility of the lazy cloning, so this also tweaked the implementation to require more storage per Record, but identify more cases where cloning is not necessary.

Also see the description for the PR changing the lazy cloning algorithm:

Change mustClone calculation to use _minId

Previously it used a calculation of whether a given Record (or Bundle)
contained an "external reference", defined as any element with an _id
less than the _id of the Record. This does not work for certain cases
because it is not recursive and a child Record could itself contain an
external reference. Changing the calculation to be recursive negates
much of the benefit because many cases that should not need to clone end
up cloning.

The new algorithm instead uses a notion of "minimum id" for a Data. For
most Data, this is just _id, but for Records, it is the minimum id of
any of its children (recursively). This does replace the previous 1-byte
memoized field with an 8-byte field, but this algorithm works better and
should result in more savings from lazy cloning.

Contributor Checklist

  • Did you add Scaladoc to every public function/method?
  • Did you add at least one test demonstrating the PR?
  • Did you delete any extraneous printlns/debugging code?
  • Did you specify the type of improvement?
  • [NA] Did you add appropriate documentation in docs/src?
  • Did you state the API impact?
  • Did you specify the code generation impact?
  • Did you request a desired merge strategy?
  • Did you add text to be included in the Release Notes for this change?

Type of Improvement

  • bug fix
  • performance improvement

API Impact

No impact

Backend Code Generation Impact

No impact

Desired Merge Strategy

  • Rebase: You will rebase the PR onto master and it will be merged with a merge commit.

Release Notes

Fix bugs where some Records and nested Bundles may not be cloned when they should be resulting in confusing errors downstream. Also increase the number of cases where cloning can be avoided.

Reviewer Checklist (only modified by reviewer)

  • Did you add the appropriate labels?
  • Did you mark the proper milestone (Bug fix: 3.4.x, [small] API extension: 3.5.x, API modification or big change: 3.6.0)?
  • Did you review?
  • Did you check whether all relevant Contributor checkboxes have been checked?
  • Did you do one of the following when ready to merge:
    • Squash: You/ the contributor Enable auto-merge (squash), clean up the commit message, and label with Please Merge.
    • Merge: Ensure that contributor has cleaned up their commit history, then merge with Create a merge commit.

@jackkoenig jackkoenig added this to the 3.6.0 milestone Jan 6, 2023
@jackkoenig jackkoenig changed the title Fix lazy clone Fix lazy cloning and apply the optimization in more cases Jan 6, 2023
@jackkoenig jackkoenig force-pushed the fix-lazy-clone branch 2 times, most recently from e0409ca to 25df0fb Compare January 6, 2023 06:08
Copy link
Contributor

@mwachs5 mwachs5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, should we have equivalent "not cloning" cases for each?

Previously it used a calculation of whether a given Record (or Bundle)
contained an "external reference", defined as any element with an _id
less than the _id of the Record. This does not work for certain cases
because it is not recursive and a child Record could itself contain an
external reference. Changing the calculation to be recursive negates
much of the benefit because many cases that should not need to clone end
up cloning.

The new algorithm instead uses a notion of "minimum id" for a Data. For
most Data, this is just _id, but for Records, it is the minimum id of
any of its children (recursively). This does replace the previous 1-byte
memoized field with an 8-byte field, but this algorithm works better and
should result in more savings from lazy cloning.
@jackkoenig jackkoenig merged commit 8bac0d7 into master Jan 6, 2023
@jackkoenig jackkoenig deleted the fix-lazy-clone branch January 6, 2023 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants