-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory optimized dimension table #9802
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9802 +/- ##
=============================================
- Coverage 64.11% 15.80% -48.32%
+ Complexity 5385 175 -5210
=============================================
Files 1903 1923 +20
Lines 102554 103482 +928
Branches 15604 15758 +154
=============================================
- Hits 65753 16354 -49399
- Misses 32017 85936 +53919
+ Partials 4784 1192 -3592
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a test for it?
pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTable.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/LookupRecordLocation.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/data/manager/offline/MemoryOptimizedDimensionTable.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/data/manager/offline/MemoryOptimizedDimensionTable.java
Outdated
Show resolved
Hide resolved
af6d8df
to
d157124
Compare
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please reformat the changes
.../src/main/java/org/apache/pinot/core/data/manager/offline/MemoryOptimizedDimensionTable.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
if (lookupRecordLocation == null) { | ||
return null; | ||
} | ||
return lookupRecordLocation.getRecord(_reuse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this thread safe? _reuse being shared across .get calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to clear the _reuse before passing it to .getRecord
.
That's what we do in other places in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get can be called concurrently. Think this should be ThreadLocal<GenericRow> _reuse
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I think this is a bug
Currently, we store the whole row in the Dimension table hash map for fast lookups.
However, in some cases, we can trade off the speed for memory efficiency.
In this PR, I am adding a new implementation that only stores the segment reference and docID, and the actual value is read at the time of lookup.
I have also created another design where I am using different DataManager implementations instead of DataTable implementations. It touches a lot of code paths though hence I didn't raise PR for that approach. The code can be seen here
https://github.com/KKcorps/incubator-pinot/pull/23/files
Release Notes
Disable preloading while using Dimension tables to save memory
Documentation
add the following config in tableConfig json to disable preloading