-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prefixesToRename config for renaming fields upon ingestion #8273
Add prefixesToRename config for renaming fields upon ingestion #8273
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8273 +/- ##
============================================
- Coverage 71.09% 70.78% -0.31%
+ Complexity 4314 4265 -49
============================================
Files 1626 1636 +10
Lines 84881 85844 +963
Branches 12788 12931 +143
============================================
+ Hits 60343 60764 +421
- Misses 20394 20886 +492
- Partials 4144 4194 +50
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Outdated
Show resolved
Hide resolved
I feel the config of Also, how do we handle the collision from multiple nested columns that happen to have the same name, after the prefix drop? |
Yea I like that. How does something like this look? @yupeng9
In terms of avoiding collisions with same column names, perhaps we can add a validation check when the table config is added? I believe there's similar validation checks in place currently for comparing column names in the configs against the schema cc @icefury71 Also since the general idea is to allow for renaming columns from the source data upon ingestion, we could also end up expanding on this via some different approaches. One approach could be to specify the exact full column names we want to rename, and specify what we want to rename it to, like how we have that capability in transform configs. Another approach would be to apply the config in batch, so like batch removing/renaming common prefixes, or batch converting snake_case to camelCase, etc. so that it's not tedious to specify everything like in the first approach. For this PR, I was thinking of just starting with |
I think it could be even simpler:
Yes, validation works, since this is at the schema level. |
Thanks @yupeng9 , I've updated based on your suggestion and added validation |
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding this option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the release note session, could you please add the actual config key so that user knows how to config it by reading the release note?
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Outdated
Show resolved
Hide resolved
if (_prefixesToRename.isEmpty()) { | ||
return; | ||
} | ||
List<String> fields = new ArrayList<>(record.getFieldToValueMap().keySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid creating this extra list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually, we need this list to avoid ConcurrentModificationException
...l/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java
Outdated
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
Outdated
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ingestion/ComplexTypeConfig.java
Outdated
Show resolved
Hide resolved
pinot-common/src/test/java/org/apache/pinot/common/utils/config/TableConfigSerDeTest.java
Outdated
Show resolved
Hide resolved
…g/TableConfigSerDeTest.java Co-authored-by: Xiaotian (Jackie) Jiang <[email protected]>
Yep updated. Thanks for the review @Jackie-Jiang ! |
Description
This change allows fields to be renamed upon ingestion to avoid duplication of columns/data. More specifically, the proposed change will enabling dropping specified prefixes of fields that are in the source data, so that the resulting field names do not have those prefixes. This addresses issue #8161
More details here in this design doc here - https://docs.google.com/document/d/1U_vQC0BiCCEcx49Tsp499V5F075iJ3fW9IrsD8sNdU0/edit?usp=sharing
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
prefixesToRename
config to rename columns upon ingestion based on their prefixesDocumentation