Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support fast filter message in tiered storage #8197

Closed
1 task done
lizhimins opened this issue May 24, 2024 · 0 comments · Fixed by #8198
Closed
1 task done

[Enhancement] Support fast filter message in tiered storage #8197

lizhimins opened this issue May 24, 2024 · 0 comments · Fixed by #8198

Comments

@lizhimins
Copy link
Member

lizhimins commented May 24, 2024

Before Creating the Enhancement Request

  • I have confirmed that this should be classified as an enhancement rather than a bug/feature.

Summary

In the tiered storage module, server-side message filtering is supported.
There are three opportunities for filtering, which are:

  1. When the consume queue (cq) is retrieved but the CommitLog has not been fetched.
  2. When fetching data from the cache, similar to local storage, prefetch some data backwards.
  3. After fetching data, perform filtering post-retrieval.

Unlike the characteristics of local storage, if filtering is done in stage 1, it will result in fragmented IO for data retrieval, which is too inefficient. The current filtering logic is executed in stage 3. However, if the consumer's request contains a large number of tags that do not match, it leads to excessive interactions between the client and server. Therefore, moving the filtering logic to stage 2 can effectively increase the consumption speed of a single queue and reduce the number of RPCs between the client and server.

在分级存储模块中,支持服务端消息过滤。有三个时机可以进行过滤,分别是:

  1. 取回 cq 但没有取回 CommitLog 时。
  2. 从缓存中取回数据时,和本地存储一样向后预取一些数据。
  3. 取回数据后,后置进行过滤。

和本地存储的特性不同,如果在阶段1进行过滤,会导致取回数据的IO是碎片化的,性能太低。当前的过滤逻辑在3执行,假如消费者的请求有大量的 tag 不匹配,导致客户端和服务端的交互次数过多,因此将过滤的逻辑提前到阶段 2 执行,可以有效提升单队列的消费速度,并减少客户端与服务端的 rpc 次数(空拉)。

Motivation

Speedup consume in tiered storage

Describe the Solution You'd Like

Moving the filtering logic to stage 2 can effectively increase the consumption speed of a single queue and reduce the number of RPCs between the client and server.

Describe Alternatives You've Considered

Moving the filtering logic to stage 2 can effectively increase the consumption speed of a single queue and reduce the number of RPCs between the client and server.

Additional Context

No response

@lizhimins lizhimins changed the title [Enhancement] Support fast filter message by tag in tiered storage [Enhancement] Support fast filter message in tiered storage May 24, 2024
lizhimins added a commit to lizhimins/rocketmq that referenced this issue May 24, 2024
lizhimins added a commit to lizhimins/rocketmq that referenced this issue May 24, 2024
lizhimins added a commit to lizhimins/rocketmq that referenced this issue Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant