Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show table metadata info in aggregate index size form #9733

Merged
merged 4 commits into from
Nov 10, 2022

Conversation

walterddr
Copy link
Contributor

@walterddr walterddr commented Nov 4, 2022

follow up on #9712 .

This PR show aggregated column index size (avg) on table metadata API Path: /tables/{tableName}/metadata
TableLevel

@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2022

Codecov Report

Merging #9733 (d4b8b52) into master (2f640ff) will decrease coverage by 2.50%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #9733      +/-   ##
============================================
- Coverage     70.08%   67.58%   -2.51%     
- Complexity     4980     5179     +199     
============================================
  Files          1951     1450     -501     
  Lines        104561    76115   -28446     
  Branches      15836    12115    -3721     
============================================
- Hits          73279    51439   -21840     
+ Misses        26155    20993    -5162     
+ Partials       5127     3683    -1444     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.58% <100.00%> (+0.02%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ot/common/restlet/resources/TableMetadataInfo.java 100.00% <100.00%> (ø)
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/SegmentReloadMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/TableDeletionMessage.java 0.00% <0.00%> (-100.00%) ⬇️
... and 766 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@walterddr walterddr marked this pull request as ready for review November 5, 2022 00:02
}

if (Objects.isNull(dataSource.getDictionary())) {
indexStatus.put(DICTIONARY, INDEX_NOT_AVAILABLE);
} else {
indexStatus.put(DICTIONARY, INDEX_AVAILABLE);
Long indexSize = columnMetadata.getIndexSizeMap().get(ColumnIndexType.DICTIONARY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexSize == null ? INDEX_AVAILABLE

Not sure I follow. Why put AVAILABLE when size is null ?

Copy link
Contributor Author

@walterddr walterddr Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AVAILABLE as it is configured but not yet generated (or the stats are not reflecting the current size).
which is the current behavior of the REST API

maybe a better string would be AVAILABLE, SIZE UNKNOWN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to follow up with this in a separate PR. as this API is also used by the UI and it is currently not rendering the numbers correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this seems confusing. I am ok with following up separately. Didn't get a chance to review the previous PR so not sure of the full context.

If the API is called after the segments have been loaded / mapped on the server, then all indexes should exist right.

So may be the response could be something like

  • INDEX available in all segments with avg / aggregated size
  • INDEX available on N segments with avg / aggregated size
  • INDEX not available / size unknown on M segments

I am guessing size not available will happen when server has not downloaded + loaded all segments and / or there is some bug ?

Copy link
Contributor Author

@walterddr walterddr Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see the confusion. so together with #9712 .
we have 3 APIs touched

The changes I removed is for:

  • /tables/{tableName}/segments/{segmentName}/metadata to also include the index info in not just the column map, but in the index map.
    This is because i found out that it is also used by the UI and it breaks the UI, so we will not make this change until the UI issue is figured out.

to answer your question:

  • the response will not have any avg stats, b/c the problem occurs on a segment level not the table level.
  • this is an additional feature, so wont affect any current behavior, thus we can also decided to not follow up
  • the data is already returned, the only question is whether we want to organized them by


@JsonCreator
public TableMetadataInfo(@JsonProperty("tableName") String tableName,
@JsonProperty("diskSizeInBytes") long sizeInBytes, @JsonProperty("numSegments") long numSegments,
@JsonProperty("numRows") long numRows, @JsonProperty("columnLengthMap") Map<String, Double> columnLengthMap,
@JsonProperty("columnCardinalityMap") Map<String, Double> columnCardinalityMap,
@JsonProperty("maxNumMultiValuesMap") Map<String, Double> maxNumMultiValuesMap) {
@JsonProperty("maxNumMultiValuesMap") Map<String, Double> maxNumMultiValuesMap,
@JsonProperty("columnIndexSizeMap") Map<String, Map<String, Double>> columnIndexSizeMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this Nullable property?

Copy link
Contributor Author

@walterddr walterddr Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnIndexSize map will not be null. it will be empty if there's no index scheme. but unless a pinot table contains all non-dictionary columns, at least it will show the forward index size and dictionary size for some columns

Map<String, Double> columnIndexSizes = columnIndexSizesMap.getOrDefault(column, new HashMap<>());
Double indexSize = columnIndexSizes.getOrDefault(entry.getKey().getIndexName(), 0d);
indexSize += entry.getValue();
columnIndexSizes.put(entry.getKey().getIndexName(), indexSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can extract the index name once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Test
public void getTableMetadata()
throws Exception {
String tableMetadataPath = "/tables/" + TableNameBuilder.REALTIME.tableNameWithType(TABLE_NAME) + "/metadata";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this API for both REALTIME and OFFLINE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wilco

@walterddr walterddr merged commit 6ade81f into apache:master Nov 10, 2022
@walterddr walterddr deleted the add_index_agg branch December 6, 2023 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants