Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing metrics #125

Closed
10 of 12 tasks
Tracked by #126
devbugging opened this issue Mar 4, 2024 · 5 comments
Closed
10 of 12 tasks
Tracked by #126

Implementing metrics #125

devbugging opened this issue Mar 4, 2024 · 5 comments
Assignees

Comments

@devbugging
Copy link
Contributor

devbugging commented Mar 4, 2024

We need comprehensive metrics to measure the performance and resource usage of our APIs. This will help us understand the performance of different API methods and track various states and errors.

Performance

Preview Give feedback

Measuring performance can/should be done using tracing, so we can have multiple sub-calls measured as well. Ideally, we should have all the network calls as a sub-trace as well as any APIs. Traces should be enabled with a flag and not on by default.
Each API response time should also submit a simple metric measuring the time it took for the request to be processed.

Be careful to also include websocket request/responses metrics.

State

Preview Give feedback

Ingestion

Preview Give feedback

We should use prometheus and open telemetry to collect the traces and metrics.

@m-Peter
Copy link
Collaborator

m-Peter commented Mar 14, 2024

For JSON-RPC endpoints that are served over WebSocket, such as subscriptions and filtering of entities, we should add some dedicated metrics as well, e.g. active connections etc.

@franklywatson
Copy link
Contributor

@m-Peter also suggested tracking DB size over time and also DB query time

@devbugging
Copy link
Contributor Author

Add metrics for index health. Trace index health is dependent on the trace download success, if one is failed the index becomes unhealthy. Transaction index health is dependent on how far back the latest ingested event is from the latest height on the network. If too far behind the index is unhealthy.

@onflow onflow deleted a comment from m-Peter Jun 14, 2024
@illia-malachyn illia-malachyn moved this to 🧊 Backlog in 🌊 Flow 4D Jul 17, 2024
@illia-malachyn illia-malachyn moved this from 🧊 Backlog to 🔖 Ready for Pickup in 🌊 Flow 4D Jul 17, 2024
@devbugging
Copy link
Contributor Author

Another high priority metric is: #384

@j1010001
Copy link
Member

First set of metrics is implemented and Grafana Dashboard created: https://flowfoundation.grafana.net/d/PkvVJj4Mz/mainnet-general?from=now-24h&to=now&timezone=America%2FVancouver

@github-project-automation github-project-automation bot moved this from 🔖 Ready for Pickup to ✅ Done in 🌊 Flow 4D Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants