-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocation free DataBlockCache
lookups
#8140
Allocation free DataBlockCache
lookups
#8140
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8140 +/- ##
=============================================
+ Coverage 30.69% 70.19% +39.49%
- Complexity 0 4302 +4302
=============================================
Files 1613 1624 +11
Lines 83952 84292 +340
Branches 12597 12635 +38
=============================================
+ Hits 25768 59165 +33397
+ Misses 55889 21041 -34848
- Partials 2295 4086 +1791
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@@ -109,12 +111,11 @@ public int getNumDocs() { | |||
* @return Array of int values | |||
*/ | |||
public int[] getIntValuesForSVColumn(String column) { | |||
ColumnTypePair key = new ColumnTypePair(column, FieldSpec.DataType.INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the potential problem being fixed here is the per call creation of ColumnTypePair
object and that is leading to some heap/perf overhead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, about 8MB of these were allocated in 1s in a benchmark which only allocated ~25MB/s. It's one of the main sources of allocation.
@@ -109,12 +111,11 @@ public int getNumDocs() { | |||
* @return Array of int values | |||
*/ | |||
public int[] getIntValuesForSVColumn(String column) { | |||
ColumnTypePair key = new ColumnTypePair(column, FieldSpec.DataType.INT); | |||
int[] intValues = (int[]) _valuesMap.get(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously there is a single level indirection to get the corresponding values[] for a given ColumnTypePar
. Now it's a Map<Type,Map<String,Object>>
plus another function call getValues().
So while we avoid the creation of ColumnTypePair object, is it possible that new code will add some perf overhead that will negate any benefit of this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has all been measured, let me put together benchmark results (you can see the combined effect of the set of changes in #8134)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that an EnumMap lookup is an array access by ordinal, and the array is very small, so the cost of indirection here is very low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pinot-core/src/main/java/org/apache/pinot/core/common/DataBlockCache.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/common/DataBlockCache.java
Outdated
Show resolved
Hide resolved
I will apply Jackie's suggestions before merging, please wait until I've done that (tomorrow) |
6321642
to
8cfedf9
Compare
8cfedf9
to
1ae6e8a
Compare
This reverts commit 65dcfe7.
ColumnTypePair
show up just behindint[]
/double[]
in allocation profiles of query execution.This changes avoids allocating these by using the
DataType
as a first level lookup into anEnumMap
(a small array behind the scenes) to aSet<String>
which is the set of columns for that data type. This reduces allocation rate and improves performance (does not regress) for a range of queries:master
branch