-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
show table metadata info in aggregate index size form #9733
Conversation
5025f4f
to
07b88b3
Compare
Codecov Report
@@ Coverage Diff @@
## master #9733 +/- ##
============================================
- Coverage 70.08% 67.58% -2.51%
- Complexity 4980 5179 +199
============================================
Files 1951 1450 -501
Lines 104561 76115 -28446
Branches 15836 12115 -3721
============================================
- Hits 73279 51439 -21840
+ Misses 26155 20993 -5162
+ Partials 5127 3683 -1444
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
} | ||
|
||
if (Objects.isNull(dataSource.getDictionary())) { | ||
indexStatus.put(DICTIONARY, INDEX_NOT_AVAILABLE); | ||
} else { | ||
indexStatus.put(DICTIONARY, INDEX_AVAILABLE); | ||
Long indexSize = columnMetadata.getIndexSizeMap().get(ColumnIndexType.DICTIONARY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indexSize == null ? INDEX_AVAILABLE
Not sure I follow. Why put AVAILABLE when size is null ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AVAILABLE as it is configured but not yet generated (or the stats are not reflecting the current size).
which is the current behavior of the REST API
maybe a better string would be AVAILABLE, SIZE UNKNOWN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have to follow up with this in a separate PR. as this API is also used by the UI and it is currently not rendering the numbers correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this seems confusing. I am ok with following up separately. Didn't get a chance to review the previous PR so not sure of the full context.
If the API is called after the segments have been loaded / mapped on the server, then all indexes should exist right.
So may be the response could be something like
- INDEX available in all segments with avg / aggregated size
- INDEX available on N segments with avg / aggregated size
- INDEX not available / size unknown on M segments
I am guessing size not available will happen when server has not downloaded + loaded all segments and / or there is some bug ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see the confusion. so together with #9712 .
we have 3 APIs touched
/tables/{tableName}/segments/{segmentName}/metadata
(modified in add index key and size map to segment metadata #9712)/segments/{tableName}/metadata
is also affected by the same code change
/tables/{tableName}/metadata
(current PR modified this)
The changes I removed is for:
/tables/{tableName}/segments/{segmentName}/metadata
to also include the index info in not just the column map, but in the index map.
This is because i found out that it is also used by the UI and it breaks the UI, so we will not make this change until the UI issue is figured out.
to answer your question:
- the response will not have any avg stats, b/c the problem occurs on a segment level not the table level.
- this is an additional feature, so wont affect any current behavior, thus we can also decided to not follow up
- the data is already returned, the only question is whether we want to organized them by
- column-->index-->size (done in add index key and size map to segment metadata #9712), or
- index-->column-->size (not done)
974e5cd
to
be08be0
Compare
be08be0
to
07abbaa
Compare
32e8357
to
a7889bc
Compare
|
||
@JsonCreator | ||
public TableMetadataInfo(@JsonProperty("tableName") String tableName, | ||
@JsonProperty("diskSizeInBytes") long sizeInBytes, @JsonProperty("numSegments") long numSegments, | ||
@JsonProperty("numRows") long numRows, @JsonProperty("columnLengthMap") Map<String, Double> columnLengthMap, | ||
@JsonProperty("columnCardinalityMap") Map<String, Double> columnCardinalityMap, | ||
@JsonProperty("maxNumMultiValuesMap") Map<String, Double> maxNumMultiValuesMap) { | ||
@JsonProperty("maxNumMultiValuesMap") Map<String, Double> maxNumMultiValuesMap, | ||
@JsonProperty("columnIndexSizeMap") Map<String, Map<String, Double>> columnIndexSizeMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this Nullable property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
columnIndexSize map will not be null. it will be empty if there's no index scheme. but unless a pinot table contains all non-dictionary columns, at least it will show the forward index size and dictionary size for some columns
Map<String, Double> columnIndexSizes = columnIndexSizesMap.getOrDefault(column, new HashMap<>()); | ||
Double indexSize = columnIndexSizes.getOrDefault(entry.getKey().getIndexName(), 0d); | ||
indexSize += entry.getValue(); | ||
columnIndexSizes.put(entry.getKey().getIndexName(), indexSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can extract the index name once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@Test | ||
public void getTableMetadata() | ||
throws Exception { | ||
String tableMetadataPath = "/tables/" + TableNameBuilder.REALTIME.tableNameWithType(TABLE_NAME) + "/metadata"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for this API for both REALTIME and OFFLINE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wilco
5f1f9f8
to
3e66408
Compare
follow up on #9712 .
This PR show aggregated column index size (avg) on table metadata API Path:
data:image/s3,"s3://crabby-images/e2ef7/e2ef75f139e6a4dc93ff6573d7f56257e45cb3ec" alt="TableLevel"
/tables/{tableName}/metadata