Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ignoreMerger for partial upsert #7907

Merged
merged 1 commit into from
Jan 15, 2022

Conversation

deemoliu
Copy link
Contributor

Description

Add ignore mergers.

Recently we got interesting use cases from industry about partial upsert.

Users have two event as follows, t is the timestamp column and t1<t2

{t1, a1, b1, c1, d1}
{t2, a2, nil, nil, nil}

user specified field "a" as Overwrite field, and "b", "c", "d" field are empty in the second event.
she expected merge result to be {a2, b1, c1, d1}
However the merge result was {a2, nil, nil, nil} which is the same as full upsert.

The reason of this issue is because she didn't specify the mergers for "b", "c", "d" fields. Thus these fields will use the default behavior, "Overwrite regardless null".

{
  "upsertConfig": {
    "mode": "PARTIAL",
    "partialUpsertStrategies":{
      "a": "OVERWRITE"
    }
  }
}

Her issue can be fixed with the following config, since the "overwrite" merger behavior is "Overwrite unless null".

{
  "upsertConfig": {
    "mode": "PARTIAL",
    "partialUpsertStrategies":{
      "a": "OVERWRITE",
      "b": "OVERWRITE",
      "c": "OVERWRITE",
      "d": "OVERWRITE"
    }
  }
}

Even though the overwrite mergers works here (because it ignore null value), the behavior is ignore but not overwrite which is confusing. In this PR, i created ignore merger to address this so partial upsert can have a cleaner interface.

{
  "upsertConfig": {
    "mode": "PARTIAL",
    "partialUpsertStrategies":{
      "a": "OVERWRITE",
      "b": "IGNORE",
      "c": "IGNORE",
      "d": "IGNORE"
    }
  }
}

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

Codecov Report

Merging #7907 (8583ab3) into master (aa2da07) will decrease coverage by 56.97%.
The diff coverage is 0.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #7907       +/-   ##
=============================================
- Coverage     71.32%   14.35%   -56.98%     
+ Complexity     4092       80     -4012     
=============================================
  Files          1589     1545       -44     
  Lines         82139    80273     -1866     
  Branches      12270    12067      -203     
=============================================
- Hits          58589    11525    -47064     
- Misses        19578    67892    +48314     
+ Partials       3972      856     -3116     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 ?
unittests2 14.35% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...not/segment/local/upsert/PartialUpsertHandler.java 0.00% <ø> (-33.34%) ⬇️
...inot/segment/local/upsert/merger/IgnoreMerger.java 0.00% <0.00%> (ø)
...ocal/upsert/merger/PartialUpsertMergerFactory.java 0.00% <0.00%> (-50.00%) ⬇️
...rg/apache/pinot/spi/config/table/UpsertConfig.java 0.00% <0.00%> (-89.48%) ⬇️
...ain/java/org/apache/pinot/core/data/table/Key.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/spi/utils/BooleanUtils.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/core/data/table/Record.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/core/util/GroupByUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/data/MetricFieldSpec.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1262 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa2da07...8583ab3. Read the comment docs.

@yupeng9
Copy link
Contributor

yupeng9 commented Dec 15, 2021

About naming: how about adding this behavior to Overwrite, which seems to be the main use of upsert?
The current behavior could be better described as OverwriteForce or OverwriteAlways to indicate overwrite even with null.

What do you think @deemoliu @Jackie-Jiang

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need an IgnoreMerger, but just not put a merger in PartialUpsertHandler._column2Mergers map when the strategy is IGNORE. That can also save the unnecessary overhead

@Jackie-Jiang
Copy link
Contributor

@yupeng9 I feel IGNORE is more intuitive, which simply ignores the previous value (full upsert semantic)

@deemoliu deemoliu force-pushed the qiaochu/ignore-merger branch from 8583ab3 to b8af001 Compare January 14, 2022 23:48
@Jackie-Jiang Jackie-Jiang merged commit 7ec47c4 into apache:master Jan 15, 2022
klsince pushed a commit to klsince/pinot that referenced this pull request Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants