Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix Merge Into Partial Updates with Global Index #13083

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

codope
Copy link
Member

@codope codope commented Apr 3, 2025

Change Logs

The USP of partial updates is that users don't have to specify all fields in the merge into command. However, with global index and partition path updates, merge into command will fail because the expectation is that full record is provided to HoodieIndexUtils::mergeIncomingWithExistingRecordWithExpressionPayload. This PR attempts to fix the behavior by doing partial merge and building full record i.e. get the merged record and then fill in the missing fields from existing record. The PR still uses record merger API in both the above method as well as HoodieMergedReadHandle#doMergedRead. Ideally, we would want to use the filegroup reader in HoodieMergedReadHandle instad of record merger. That's a larger refactoring and for now, maybe we can just error out with message that merge into partial updates are not supported with global index.

Impact

Fix Merge Into Partial Updates with Global Index

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Apr 3, 2025
@@ -311,15 +313,74 @@ private static <R> Option<HoodieRecord<R>> mergeIncomingWithExistingRecordWithEx
return Option.of(result);
}

// At this point, result.getData() contains a partial record update.
IndexedRecord existingRecord = existing.toIndexedRecord(existingSchema, config.getProps())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hold off any changes to the non-conventional merging logic that goes through merger or does not go through the file group reader. We can simplify the logic along with unifying the reader path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants