Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for gracefully handling the errors while transformations #9377

Merged
merged 3 commits into from
Sep 22, 2022

Conversation

KKcorps
Copy link
Contributor

@KKcorps KKcorps commented Sep 11, 2022

Add support for honouring continueOnError during

  • complexTypeTransform
  • filterTransform
  • recordReader

Also, adds support for keeping track of such records using incompleteRecordCount

@KKcorps KKcorps force-pushed the continueOnError_patch branch from e1fe791 to 2574c9f Compare September 11, 2022 11:33
@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2022

Codecov Report

Merging #9377 (ebe6dad) into master (65e97d2) will decrease coverage by 1.36%.
The diff coverage is 53.03%.

@@             Coverage Diff              @@
##             master    #9377      +/-   ##
============================================
- Coverage     69.76%   68.40%   -1.37%     
+ Complexity     4787     4705      -82     
============================================
  Files          1885     1885              
  Lines        100293   100464     +171     
  Branches      15256    15292      +36     
============================================
- Hits          69966    68718    -1248     
- Misses        25381    26793    +1412     
- Partials       4946     4953       +7     
Flag Coverage Δ
integration1 26.04% <0.00%> (-0.05%) ⬇️
integration2 ?
unittests1 66.89% <53.03%> (-0.06%) ⬇️
unittests2 15.35% <0.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../org/apache/pinot/spi/data/readers/GenericRow.java 66.66% <ø> (ø)
...gment/local/segment/creator/TransformPipeline.java 83.72% <42.85%> (-7.95%) ⬇️
...ent/local/recordtransformer/FilterTransformer.java 59.09% <43.75%> (-40.91%) ⬇️
...ocal/recordtransformer/ComplexTypeTransformer.java 58.71% <47.05%> (-1.87%) ⬇️
...t/creator/impl/SegmentIndexCreationDriverImpl.java 80.10% <52.63%> (-3.62%) ⬇️
...t/local/recordtransformer/DataTypeTransformer.java 91.34% <100.00%> (+0.08%) ⬆️
...local/recordtransformer/ExpressionTransformer.java 89.06% <100.00%> (+0.35%) ⬆️
...t/core/plan/StreamingInstanceResponsePlanNode.java 0.00% <0.00%> (-100.00%) ⬇️
...ore/operator/streaming/StreamingResponseUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...server/starter/helix/SegmentReloadStatusValue.java 0.00% <0.00%> (-100.00%) ⬇️
... and 178 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

flattenMap(record, new ArrayList<>(record.getFieldToValueMap().keySet()));
for (String collection : _fieldsToUnnest) {
unnestCollection(record, collection);
try {
Copy link
Contributor

@xiangfu0 xiangfu0 Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @Jackie-Jiang I remember you mentioned that try-catch will impact the performance? shall do if-else check first?

Codewise I think this is simpler.

if (_continueOnError) {
    flattenMap(record, new ArrayList<>(record.getFieldToValueMap().keySet()));
    for (String collection : _fieldsToUnnest) {
      unnestCollection(record, collection);
    }
} else {
  try {
    flattenMap(record, new ArrayList<>(record.getFieldToValueMap().keySet()));
    for (String collection : _fieldsToUnnest) {
      unnestCollection(record, collection);
    }
  } catch (Exception e) {
       record.putValue(GenericRow.INCOMPLETE_RECORD_KEY, true);
       LOGGER.debug("Caught exception while transforming complex types for record: {}", record.toString(), e);
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. It just looked like code was being duplicated and hence didn't do it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang AFAIK, compiler doesn't optimize the code inside the try block, but other than that there's no performance overhead for just using a try block. Let me know if you have experienced otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way should be fine. I don't remember seeing extra overhead on try-catch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jackie-Jiang for confirming! @KKcorps can you also update the change in this PR(https://github.com/apache/pinot/pull/9376/files) to make the code consistent?

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, otherwise

Copy link
Contributor

@sajjad-moradi sajjad-moradi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR only handles segment generation part. We need to take care of the realtime ingestion as well.

flattenMap(record, new ArrayList<>(record.getFieldToValueMap().keySet()));
for (String collection : _fieldsToUnnest) {
unnestCollection(record, collection);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang AFAIK, compiler doesn't optimize the code inside the try block, but other than that there's no performance overhead for just using a try block. Let me know if you have experienced otherwise.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

flattenMap(record, new ArrayList<>(record.getFieldToValueMap().keySet()));
for (String collection : _fieldsToUnnest) {
unnestCollection(record, collection);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way should be fine. I don't remember seeing extra overhead on try-catch

@KKcorps KKcorps force-pushed the continueOnError_patch branch from f7b3844 to ebe6dad Compare September 14, 2022 14:49
@KKcorps
Copy link
Contributor Author

KKcorps commented Sep 14, 2022

This PR only handles segment generation part. We need to take care of the realtime ingestion as well.

Agreed. The reason I haven't tackled it in this PR is because it is not clear what user can do to fix missing data in case we ignore it in realtime-ingestion.

@sajjad-moradi
Copy link
Contributor

This PR only handles segment generation part. We need to take care of the realtime ingestion as well.

Agreed. The reason I haven't tackled it in this PR is because it is not clear what user can do to fix missing data in case we ignore it in realtime-ingestion.

IMO it should be up to users. If they set continueOnError to true, they have agreed that's the behavior they want, so RT ingestion continues. If they don't set it or if they set it to false, then ingestion stops and they'll take care of the issue right away.

@KKcorps KKcorps merged commit bdf632c into apache:master Sep 22, 2022
61yao pushed a commit to 61yao/pinot that referenced this pull request Oct 3, 2022
…pache#9377)

* Add support for gracefully handling the errors while transformations

* Maintain consistent checks across transformers

* Add incompleteRowCount metric to realtime servers

Co-authored-by: Kartik Khare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants