Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-9249]Support displaying InsertIntoHoodieTableCommand metrics in Spark Web UI #13068

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

wangyinsheng
Copy link
Contributor

Change Logs

Support displaying InsertIntoHoodieTableCommand metrics in Spark Web UI

implement steps

  • Change the return type of BaseHoodieWriteClient.commit from boolean to CommitStatus(contains a boolean status success and a HoodieCommitMetadata)
  • update metrics of InsertIntoHoodieTableCommand from HoodieCommitMetadata

Before
image

After
image

Impact

Change the return type of BaseHoodieWriteClient.commit from boolean to CommitStatus(contains a boolean status success and a HoodieCommitMetadata)

Risk level (write none, low medium or high below)

low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Apr 1, 2025
@wangyinsheng wangyinsheng changed the title [HUDI-9249]Support displaying InsertIntoHoodieTableCommand metrics in Spark Web UI WIP[HUDI-9249]Support displaying InsertIntoHoodieTableCommand metrics in Spark Web UI Apr 1, 2025
@wangyinsheng
Copy link
Contributor Author

@hudi-bot run azure

@wangyinsheng wangyinsheng changed the title WIP[HUDI-9249]Support displaying InsertIntoHoodieTableCommand metrics in Spark Web UI [HUDI-9249]Support displaying InsertIntoHoodieTableCommand metrics in Spark Web UI Apr 2, 2025
@wangyinsheng
Copy link
Contributor Author

@hudi-bot run azure

@wangyinsheng wangyinsheng force-pushed the add_mertrics_for_insert branch from 1ac0bae to f42ddb2 Compare April 2, 2025 06:14
@hudi-bot
Copy link

hudi-bot commented Apr 2, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

}

public abstract boolean commit(String instantTime, O writeStatuses, Option<Map<String, String>> extraMetadata,
public abstract CommitStatus commit(String instantTime, O writeStatuses, Option<Map<String, String>> extraMetadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CommitStatus can be eliminated if we just return an option of HoodieCommitMetadata ?, empty means a failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of 'EmptyCommit', there is no HoodieCommitMetadata produced, but need return true

public boolean commitStats(String instantTime, List<HoodieWriteStat> stats,
Option<Map<String, String>> extraMetadata,
String commitActionType, Map<String, List<String>> partitionToReplaceFileIds,
Option<BiConsumer<HoodieTableMetaClient, HoodieCommitMetadata>> extraPreCommitFunc) {
// Skip the empty commit if not allowed
if (!config.allowEmptyCommit() && stats.isEmpty()) {
return true;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, we can return null?

Copy link
Contributor Author

@wangyinsheng wangyinsheng Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, we can return null?

Many callers need to verify success or failure, return null for EmptyCommit case and return option for other cases will change simple boolean check to commitMetadataOpt == null || commitMetadataOpt.ifPresent, This is a bit weird,I Idon't think it is a good way。

boolean success = writeClient.commit(instant, writeResults, Option.of(checkpointCommitMetadata),
tableState.commitAction, partitionToReplacedFileIds);
if (success) {
reset();
this.ckpMetadata.commitInstant(instant);
LOG.info("Commit instant [{}] success!", instant);
} else {

val commitSuccess =
client.commit(tableInstantInfo.instantTime, writeResult.getWriteStatuses,
common.util.Option.of(new java.util.HashMap[String, String](metaMap.asJava)),
tableInstantInfo.commitActionType,
writeResult.getPartitionToReplaceFileIds,
common.util.Option.ofNullable(extraPreCommitFn.orNull))
if (commitSuccess) {
log.info("Commit " + tableInstantInfo.instantTime + " successful!")
}

@wangyinsheng
Copy link
Contributor Author

@danny0405 PTAL

@wangyinsheng
Copy link
Contributor Author

@danny0405 PTAL

@wangyinsheng wangyinsheng requested a review from danny0405 April 11, 2025 07:23
@danny0405
Copy link
Contributor

danny0405 commented Apr 11, 2025

not sure how the metrics work here because the commit has been finished, does it still make sense to show them on the UI ? can we just read the commit metadata from timeline for these metircs instead of touching the core API now ?

@wangyinsheng
Copy link
Contributor Author

now sure how the metrics work here because the commit has been finished, does it still make sense to show them on the UI ? can we just read the commit metadata from timeline for these metircs instead of touching the core API now ?

Reading commit metadata through the call command can indeed retrieve these metrics, However, displaying the corresponding commit metrics directly on the Spark UI can simplify the troubleshooting process to some extent, eliminating the need for additional query actions. Additionally, the data displayed in the Spark UI comes from the spark event log, and the Spark UI of long-past run instances can be restored through the event log, with these commit metrics being displayed accordingly. In contrast, querying commit matedata from a long time ago via the API can be relatively cumbersome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants