Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: cache metrics should be state table id + actor id #9020

Closed
st1page opened this issue Apr 6, 2023 · 0 comments · Fixed by #9039
Closed

metrics: cache metrics should be state table id + actor id #9020

st1page opened this issue Apr 6, 2023 · 0 comments · Fixed by #9039
Assignees
Milestone

Comments

@st1page
Copy link
Contributor

st1page commented Apr 6, 2023

Currently, Our cache's metrics use the actor_id as the key. But one fregment could have multiple executors with the same type. So we need to use executor id here.
https://github.com/risingwavelabs/risingwave/blob/main/grafana/risingwave-dev-dashboard.dashboard.py

                panels.timeseries_count(
                    "Join Cached Rows",
                    "Multiple rows with distinct primary keys may have the same join key. This metric counts the "
                    "number of rows in the executor cache.",
                    [
                        panels.target(f"{metric('stream_join_cached_rows')}",
                                      "{{actor_id}} {{side}}"),
                    ],
                ),
                panels.timeseries_bytes(
                    "Join Cached Estimated Size",
                    "Multiple rows with distinct primary keys may have the same join key. This metric counts the "
                    "size of rows in the executor cache.",
                    [
                        panels.target(f"{metric('stream_join_cached_estimated_size')}",
                                      "{{actor_id}} {{side}}"),
                    ],
                ),
                panels.timeseries_actor_ops(
                    "Aggregation Executor Cache Statistics For Each Key/State",
                    "Lookup miss count counts the number of aggregation key's cache miss per second."
                    "Lookup total count counts the number of rows processed per second."
                    "By diving these two metrics, one can derive the cache miss rate per second.",
                    [
                        panels.target(
                            f"rate({metric('stream_agg_lookup_miss_count')}[$__rate_interval])",
                            "cache miss {{actor_id}}",
                        ),
                        panels.target(
                            f"rate({metric('stream_agg_lookup_total_count')}[$__rate_interval])",
                            "total lookups {{actor_id}}",
                        ),
                    ],
                ),
                panels.timeseries_actor_ops(
                    "Aggregation Executor Cache Statistics For Each StreamChunk",
                    "",
                    [
                        panels.target(
                            f"rate({metric('stream_agg_chunk_lookup_miss_count')}[$__rate_interval])",
                            "chunk-level cache miss {{actor_id}}",
                        ),
                        panels.target(
                            f"rate({metric('stream_agg_chunk_lookup_total_count')}[$__rate_interval])",
                            "chunk-level total lookups {{actor_id}}",
                        ),
                    ],
                ),
                panels.timeseries_count(
                    "Aggregation Cached Keys",
                    "The number of keys cached in each hash aggregation executor's executor cache.",
                    [
                        panels.target(f"{metric('stream_agg_cached_keys')}",
                                      "{{actor_id}}"),
                    ],
                ),
@github-actions github-actions bot added this to the release-0.19 milestone Apr 6, 2023
@wcy-fdu wcy-fdu self-assigned this Apr 6, 2023
@wcy-fdu wcy-fdu changed the title metrics: cache metrics should be indexed by executor id instead of actor id metrics: cache metrics should be state table id + actor id Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants