Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #71: Feature: Health check and metrics #464

Closed
wants to merge 1 commit into from

Conversation

astubbs
Copy link
Contributor

@astubbs astubbs commented Nov 1, 2022

fixes:

related:

Checklist

  • Documentation (if applicable)
  • Changelog

blocked by:

@astubbs astubbs changed the title Feature/health metrics fixes #71: Feature: Health check and metrics Nov 1, 2022
@astubbs astubbs mentioned this pull request Nov 1, 2022
@astubbs astubbs force-pushed the feature/health-metrics branch from aec7f8a to fa41172 Compare November 2, 2022 13:36
@astubbs astubbs force-pushed the feature/health-metrics branch from fa41172 to 38ed9ad Compare November 2, 2022 13:57
@JorgenRingen
Copy link
Contributor

Nice 👍 Metrics will definitively be useful. Didn't quite get how a health-check should be implemented? parallelConsumer.calculateMetrics().getPollerMetrics().getState()?

Or would it be possible to expose io.confluent.parallelconsumer.internal.AbstractParallelEoSStreamProcessor#state public?

Just a question regarding the class-hierarchy:
The way we've implemented it now is to instantiate through ParallelStreamProcessor.createEosStreamProcessor which returns ParallelStreamProcessor. This requires a somewhat unpleasant cast to be able to retrieve the metrics, failure-cause and perhaps state:

ParallelStreamProcessor<String, String> parallelStreamProcessor = ParallelStreamProcessor.createEosStreamProcessor(null);

// needs to cast to get metrics and failure-cause
ParallelEoSStreamProcessor<String, String> parallelEoSStreamProcessor = (ParallelEoSStreamProcessor<String, String>) parallelStreamProcessor;
PCMetrics metrics = parallelEoSStreamProcessor.calculateMetrics();
Exception failureCause = parallelEoSStreamProcessor.getFailureCause();

// whould be nice to retrieve state directly
// State state = parallelEoSStreamProcessor.getState()

Perhaps this data should be moved up in the hierarchy? If it's under ParallelConsumer interface for example we can use the same api regardless of implementation.

Also, current "requirements" for health-check is pretty simple and used for rolling updates and error-handling:

  • know when message-processing starts so that existing instances can be removed (we check the !isClosedOrFailed now, but could perhaps be checking (running || paused))
  • know when a failure happens so that instance can be terminated (and potentially restarted) (we check failureCause)

@astubbs
Copy link
Contributor Author

astubbs commented Nov 4, 2022

Are you familiar with micrometer? We're probably just going to use that.

@JorgenRingen
Copy link
Contributor

JorgenRingen commented Nov 4, 2022

Yes, we use micrometer for both streams and consumer/producer:

io.micrometer.core.instrument.binder.kafka.KafkaStreamsMetrics
io.micrometer.core.instrument.binder.kafka.KafkaClientMetrics

@astubbs
Copy link
Contributor Author

astubbs commented Dec 2, 2022

Superseded by:

@@ -629,5 +631,30 @@ boolean checkIfWorkIsStale(final WorkContainer<?, ?> workContainer) {
return false;
}

}
public PCMetrics.PCPartitionMetrics getMetrics() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we invert this? Move the construction into the metrics class, instead of the state class? (pass the metrics class the state)

@eddyv
Copy link
Member

eddyv commented Jun 15, 2023

Closing - Stale / superseded by #485

@eddyv eddyv closed this Jun 15, 2023
@rkolesnev rkolesnev mentioned this pull request Jun 28, 2023
3 tasks
@rkolesnev rkolesnev mentioned this pull request Jul 31, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants