revert #7957 to validate the performance regression of q7 and try to improve if true #8510

lmatz · 2023-03-13T10:13:36Z

Meanwhile q7-rewrite which uses group top-n does not change performance much. Didn't see other queries also affected.

The text was updated successfully, but these errors were encountered:

lmatz · 2023-03-13T17:42:05Z

Doubled the throughput after reverting the PR

https://github.com/risingwavelabs/risingwave/tree/lz/revert_7957

lmatz · 2023-03-14T04:15:12Z

check the performance of the queries that involve join:
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/571#0186de49-a190-42f2-a56f-4b31e4130513

Let's wait and see more results...

http://risingwave-perf-test-dashboard-metabase.us-west-2.elasticbeanstalk.com/question/2633-avg-source-output-rows-per-second-rows-s-history-thtb-119?start_date=2023-02-08

lmatz · 2023-03-14T07:04:07Z

Probably it still depends on the real workload to determine whether it improves or makes the performance worse?
Then how about making it a per-query option?
Query hints, or session variables (hard to control if there are multiple joins in a single query)

Or may_exist not good enough?

fuyufjh · 2023-03-15T06:13:02Z

Interesting 🤔 cc. @hzxa21 Any ideas about the reason?

hzxa21 · 2023-03-15T11:39:33Z

TL;DR, I think in the perf benchmark, the locality between the two stream is worse than expected so refilling cache on write in the join executor cause eviction on the old entries that are still needed, leading to higher operator cache miss. cc @KeXiangWang This is different from what we have seen before. Any thoughts?

Findings after looking at grafana (up: 02/23 no may_exist and join cache refill on write, down: 02/24):

Aggregation Cached Keys is way smaller in 02/24
Join Cached Entries is also way smaller in 02/24
Join cache miss rate is higher. Cache misses when insert entries in the left table is high and the miss keys are most likely existing keys (may_exists true rate is high) in 02/24
Read duration of the join left and right table are way higher in 02/24

KeXiangWang · 2023-03-20T02:35:52Z

I agree with Patrick. Nexmark Q7 is join price of bids and the maxprice of bids within a interval. With refilling, all the bid will be cache in the join cache. But you can image, maxprice stream's event is sparse and only read several keys of price. So, I believe the throughput decrease because of two reason:

Unnecessary call of may_exist when inserting a price event intruduce overhead.
price event occupy too much cache, causing cache miss and thus latences.

Like @lmatz mentioned, we may probably make decision depends on task and data patterns😕.

github-actions · 2023-06-04T02:15:50Z

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

lmatz · 2023-06-05T01:26:04Z

Later, we found that the 8 kafka-partition nexmark is generated by 16 threads, which can mess up the locality.
Now I suppose this factor may affect the effectiveness of refilling.

I will try to re-enable it later and make it a configurable choice.

github-actions · 2024-07-03T10:00:48Z

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

github-actions bot added this to the release-0.1.18 milestone Mar 13, 2023

lmatz added the type/perf label Mar 13, 2023

lmatz mentioned this issue Mar 15, 2023

Tracking: Nexmark queries optimization #7289

Open

54 tasks

lmatz modified the milestones: release-0.18, release-0.19 Mar 22, 2023

lmatz removed this from the release-0.19 milestone Apr 4, 2023

github-actions bot added the no-issue-activity label Jun 4, 2023

lmatz self-assigned this Jun 5, 2023

github-actions bot removed the no-issue-activity label Jun 5, 2023

github-actions bot added the no-issue-activity label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert #7957 to validate the performance regression of q7 and try to improve if true #8510

revert #7957 to validate the performance regression of q7 and try to improve if true #8510

lmatz commented Mar 13, 2023 •

edited

Loading

lmatz commented Mar 13, 2023 •

edited

Loading

lmatz commented Mar 14, 2023 •

edited

Loading

lmatz commented Mar 14, 2023 •

edited

Loading

fuyufjh commented Mar 15, 2023

hzxa21 commented Mar 15, 2023

KeXiangWang commented Mar 20, 2023

github-actions bot commented Jun 4, 2023

lmatz commented Jun 5, 2023

github-actions bot commented Jul 3, 2024

revert #7957 to validate the performance regression of q7 and try to improve if true #8510

revert #7957 to validate the performance regression of q7 and try to improve if true #8510

Comments

lmatz commented Mar 13, 2023 • edited Loading

lmatz commented Mar 13, 2023 • edited Loading

lmatz commented Mar 14, 2023 • edited Loading

lmatz commented Mar 14, 2023 • edited Loading

fuyufjh commented Mar 15, 2023

hzxa21 commented Mar 15, 2023

KeXiangWang commented Mar 20, 2023

github-actions bot commented Jun 4, 2023

lmatz commented Jun 5, 2023

github-actions bot commented Jul 3, 2024

lmatz commented Mar 13, 2023 •

edited

Loading

lmatz commented Mar 13, 2023 •

edited

Loading

lmatz commented Mar 14, 2023 •

edited

Loading

lmatz commented Mar 14, 2023 •

edited

Loading