metrics: cache metrics should be state table id + actor id #9020

st1page · 2023-04-06T06:07:51Z

Currently, Our cache's metrics use the actor_id as the key. But one fregment could have multiple executors with the same type. So we need to use executor id here.
https://github.com/risingwavelabs/risingwave/blob/main/grafana/risingwave-dev-dashboard.dashboard.py

                panels.timeseries_count(
                    "Join Cached Rows",
                    "Multiple rows with distinct primary keys may have the same join key. This metric counts the "
                    "number of rows in the executor cache.",
                    [
                        panels.target(f"{metric('stream_join_cached_rows')}",
                                      "{{actor_id}} {{side}}"),
                    ],
                ),
                panels.timeseries_bytes(
                    "Join Cached Estimated Size",
                    "Multiple rows with distinct primary keys may have the same join key. This metric counts the "
                    "size of rows in the executor cache.",
                    [
                        panels.target(f"{metric('stream_join_cached_estimated_size')}",
                                      "{{actor_id}} {{side}}"),
                    ],
                ),
                panels.timeseries_actor_ops(
                    "Aggregation Executor Cache Statistics For Each Key/State",
                    "Lookup miss count counts the number of aggregation key's cache miss per second."
                    "Lookup total count counts the number of rows processed per second."
                    "By diving these two metrics, one can derive the cache miss rate per second.",
                    [
                        panels.target(
                            f"rate({metric('stream_agg_lookup_miss_count')}[$__rate_interval])",
                            "cache miss {{actor_id}}",
                        ),
                        panels.target(
                            f"rate({metric('stream_agg_lookup_total_count')}[$__rate_interval])",
                            "total lookups {{actor_id}}",
                        ),
                    ],
                ),
                panels.timeseries_actor_ops(
                    "Aggregation Executor Cache Statistics For Each StreamChunk",
                    "",
                    [
                        panels.target(
                            f"rate({metric('stream_agg_chunk_lookup_miss_count')}[$__rate_interval])",
                            "chunk-level cache miss {{actor_id}}",
                        ),
                        panels.target(
                            f"rate({metric('stream_agg_chunk_lookup_total_count')}[$__rate_interval])",
                            "chunk-level total lookups {{actor_id}}",
                        ),
                    ],
                ),
                panels.timeseries_count(
                    "Aggregation Cached Keys",
                    "The number of keys cached in each hash aggregation executor's executor cache.",
                    [
                        panels.target(f"{metric('stream_agg_cached_keys')}",
                                      "{{actor_id}}"),
                    ],
                ),

The text was updated successfully, but these errors were encountered:

github-actions bot added this to the release-0.19 milestone Apr 6, 2023

wcy-fdu self-assigned this Apr 6, 2023

wcy-fdu changed the title ~~metrics: cache metrics should be indexed by executor id instead of actor id~~ metrics: cache metrics should be state table id + actor id Apr 6, 2023

This was referenced Apr 6, 2023

feat(streaming): monitor materialize cache miss rate when handling pk conflict #8946

Merged

refactor(metrics): fix cache metrics to be table id + actor id #9039

Merged

wcy-fdu closed this as completed in #9039 Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: cache metrics should be state table id + actor id #9020

metrics: cache metrics should be state table id + actor id #9020

st1page commented Apr 6, 2023

metrics: cache metrics should be state table id + actor id #9020

metrics: cache metrics should be state table id + actor id #9020

Comments

st1page commented Apr 6, 2023