-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] [improve] [pip] PIP-314 Add metrics for redelivery_messages #21488
base: master
Are you sure you want to change the base?
Conversation
@asafm Could you please review this PIP, when you have time. |
- Unit: `Counter` | ||
|
||
**pulsar_broker_memory_usage_of_redelivery_messages_bytes** | ||
- Description: the memory usage of all `redelivery_messages` in the broker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you calculate the memory? message count * X?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three collections under the instance MessageRedeliveryController,
which cost almost of memory for Redelivery Messages.
ConcurrentBitmapSortedLongPairSet messagesToRedeliver
: we can callgetSizeInBytes
to get the memory usage for this collection.ConcurrentLongLongPairHashMap hashesToBeBlocked
:capacity
represents(notsize
) actual memory usage, so we calculate memory usage this way:capacity * 4 * 8.
- ConcurrentLongLongHashMap hashesRefCount:
capacity
represents(notsize
) actual memory usage, so we calculate memory usage this way:capacity * 2 * 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asafm Do you think that makes sense to add topic-level metrics but only for top X topics?
For example, if we want to know the top 10 topics with the large redelivery messages in the Redelivery Controller. then we can only expose metrics for 10 topics, not all topics. This can also be applied to other topic level metrics, backlogs, producers, consumers, storage_size, backlog_size, latency, etc. We can make topic X configurable (dynamic configuration). So that we can reduce many topic-level metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take a similar problem in MySQL or Elasticsearch. You have so many queries entering the system, and you want to know the ones the are the slowest. You can enable slow-query log for example and it will log the slowest queries (above certain threshold).
We can decide to have an histogram for buffer size, with 0 buckets (basically count and max) on a namespace level. Users can be alerted when an unknown topic in a namespace have max of redelivery buffer size greater than certain threshold. If you want to query which was that we can select 2 mechanisms:
- Expose it via an HTTP endpoint. Maybe through something we don't have (like namespaceStats)?
- Log it - Here we it means the user needs to define a threshold for logging, and we only log the subscription / topic name once we cross the threshold - once.
@codelipenghui Your suggestion is to also define threshold but expose the topics/subscriptions via Prometheus metrics and not logs, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for Pulsar transactions, it also has a slow transactions endpoint to query all the slow transactions.
I think the major part is about the integration with the alert systems?
Let's take an example. Backlogs.
If we only have broker level backlog, we can have a backlog limitation for each broker. But the limitation is not easy to set because it will related to the topics. Set to 100k for example, but maybe 10k topic, one topic only has 10 backlogs. It shouldn't be a problem.
But if we have a topic level metrics only for the top 100 topics with the large backlogs. We can set the backlog to 10k so that we can detect at most 100 topics with the backlog issues.
A REST API can also integrate with the alert systems, but it's hard to check historical data.
Logs solution will work for this case; you can get historical data, but if you want to get the trend of the backlogs, it's not easy. We must set up another dashboard (Kibana) based on the logs. Because the trend is also important when troubleshooting problems. If the bottleneck is the consumer side, we should see things get better after consumers scale up.
Sorry, I think I provided a wrong example before. The latency is not a good case. Counter and Gauge should be good cases.
I will continue to think about the essential differences between different solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have given it some thought. Here's what I think:
I'm going to separate the solution into long and short term.
Long term
With each feature added you're bound to have more metric you'll need to add, which might be topic or subscription level (granularity).
Stating that you can't because we have too many doesn't make a lot sense, monitoring wise.
Working around it by introducing the alert logic into Pulsar, by printing or exporting topK or all greater Y (threshold) also doesn't seem plausible since:
a. You'll have to do this for every metric - so you're going to need a framework for this.
b. I think it's a bit convoluted architecture to introduce alerting logic inside.
The core problem is that we export too many metrics, right?
I think the solution lies within PIP-264.
Allowing you to filter: decide which metrics you export, minimizes the amount of metrics.
Allowing you to have the metric emitted in group granularity enables you to not export it topic level and only when needed - alert based - you open those metrics.
In summary, I think PIP-264 solves this problem.
Short term
Still for now, we need a way to get alerted when topic A crosses threshold X of a certain metric.
If you have too many topics, you are probably exporting metrics in namespace granularity, hence you can setup the alert on the namespace metric.
Yes, it has the disadvantage of specifying a certain threshold for all - but you'll have this problem also if you embed the threshold logic inside the broker/proxy. Yeah I know you say - topK solves it. The problem with topK is that it's very confusing to the user: Metric for certain topic dipping in out and - just imagine the graph - super confusing. If you emit it topK as log, I think it's ok, but it will be super noisy, as compared to threshold, since you always have topK so you always print them. On threshold, you can at least print the topic once the threshold is crossed.
Ok, now that you know a certain namespace is in trouble, you need to know which topics. For that I was thinking of several ideas which are not finalized, so just bringing up ideas:
- We can include this metric in the topic stats endpoint. If we can emit topic stats per namespace, we can run the HTTP request, get it, and even improve it by specifying on the queryParams which fields (JSONPath) you're actually interested at to make the result small. When the alert fires, you can trigger a script which queries it, finds the topics that crossed the threshold and then create matching Alerts in PD for each one.
Previous idea discussion
- Embedding the threshold logic and print to log when ever the threshold is crossed.
- This means we need to do this as a framework and not reinvent the wheel for each metric.
- I'm not a big fan again, because this is alerting logic and should not be as a Pulsar feature.
- Emitting this metric only for the topK topics
- It can change very often (the topK) hence the graph will look sparse, cut ==> very confusing for the user. Again this is an alerting logic so it also doesn't make a a lot of sense to embed it inside Pulsar.
# Goals | ||
- Add metrics | ||
- Broker level: | ||
- Add a metric `pulsar_broker_max_subscription_redelivery_messages_total` to indicate the max one `{redelivery_messages}` of subscriptions in the broker, used by pushing an alert if it is too large. Nit: The Broker will print a log, which contains the name of the subscription(which has the maximum count of redelivery messages). This will help find the issue subscription. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense for a broker-level metrics, right? Because we will not have topic level metrics for the redelivery messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'm confused here.
Will we introduce 2 metrics
- pulsar_broker_max_subscription_redelivery_messages_total
- pulsar_broker_memory_usage_of_redelivery_messages_bytes
How can they work together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulsar_broker_memory_usage_of_redelivery_messages_bytes
is not the goal of this proposal; I just wanted to include it in passing for troubleshooting memory usage problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stopped in the middle, since @codelipenghui made an important remark affecting this PIP as a whole. Once resolved, I'll continue.
|
||
## Delivery of messages in normal | ||
|
||
To simplify the description of the mechanism, let's take the policy [Auto Split Hash Range](https://pulsar.apache.org/docs/3.0.x/concepts-messaging/#auto-split-hash-range) as an example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This described Key Shared. Is Redelivery or messages only relevant for Key Shared subscriptions? If so, I would state that in the background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This described Key Shared. Is Redelivery or messages only relevant for Key Shared subscriptions? If so, I would state that in the background.
Yes, in Shared
mode, the messages can be delivered to any consumer after a new reading, so the Broker will deliver messages to other consumers when a consumer is stuck. The messages will not be pushed into the redelivery_messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, I would state that in the background.
Added
|
||
# Motivation | ||
|
||
For the example above, if `C1` is stuck or consumed slowly, the Broker will push the entries that should be delivered to `C1` into a memory collection `redelivery_messages` and read next entries continue, then the collection `redelivery_messages` becomes larger and larger and take up a lot of memory. When sending messages, it will also determine the key of the entries in the collection `redelivery_messages`, affecting performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you write "V1 is stuck" you mean the internal queue of C1 is full? When that happens, messages are continued to be read from the topic, but messages whose keys belongs to c1 will be placed in a buffer called Redelivery Mesages?
Is this buffer one for all "stuck" consumers or each consumer has it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you write "V1 is stuck" you mean the internal queue of C1 is full?
After a client-side consumer's incoming queue is full, the Broker will stop delivering messages to it and just push these messages into the queue redelivery_messages
When that happens, messages are continued to be read from the topic, but messages whose keys belongs to c1 will be placed in a buffer called Redelivery Mesages?
Yes.
Is this buffer one for all "stuck" consumers or each consumer has it?
If the stuck
consumer is still online, it is for stuck
consumer. Once it is offline, the messages should be delivered to another consumer.
|
||
# Motivation | ||
|
||
For the example above, if `C1` is stuck or consumed slowly, the Broker will push the entries that should be delivered to `C1` into a memory collection `redelivery_messages` and read next entries continue, then the collection `redelivery_messages` becomes larger and larger and take up a lot of memory. When sending messages, it will also determine the key of the entries in the collection `redelivery_messages`, affecting performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "determine" the key ? Also, why doing that would ruin performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "determine" the key ? Also, why doing that would ruin performance?
After reading new messages, the Broker should filter out which messages have the same key, which is stuck in redelivery_messages
, to avoid breaking the consumption order. For example:
- Client-side:
C1
is stuck now; there are1000
messages in the client memory, and the keys of these messages are[k1,k2,k3]
. - Broker-side: read batch messages, the keys of these messages are
[k1, k10]
, the Broker will filter out the messages whose key isk1
and only send other messages to the client.
The Broker uses the data structure Map
to manage keys, and stuck consumers occupy the more keys, the larger and less efficient the map becomes.
# Goals | ||
- Add metrics | ||
- Broker level: | ||
- Add a metric `pulsar_broker_max_subscription_redelivery_messages_total` to indicate the max one `{redelivery_messages}` of subscriptions in the broker, used by pushing an alert if it is too large. Nit: The Broker will print a log, which contains the name of the subscription(which has the maximum count of redelivery messages). This will help find the issue subscription. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is super confusing: Max of a (single) subscription of redelivery messages , then total?
I'll add suggestions soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is super confusing: Max of a (single) subscription of redelivery messages , then total?
I'll add suggestions soon
Agree with you, the original type of this metric is Counter
, so I wrote a suffix as suggested by https://prometheus.io/docs/practices/naming/.
Waiting for your suggestions.
- Unit: `Counter` | ||
|
||
**pulsar_broker_memory_usage_of_redelivery_messages_bytes** | ||
- Description: the memory usage of all `redelivery_messages` in the broker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take a similar problem in MySQL or Elasticsearch. You have so many queries entering the system, and you want to know the ones the are the slowest. You can enable slow-query log for example and it will log the slowest queries (above certain threshold).
We can decide to have an histogram for buffer size, with 0 buckets (basically count and max) on a namespace level. Users can be alerted when an unknown topic in a namespace have max of redelivery buffer size greater than certain threshold. If you want to query which was that we can select 2 mechanisms:
- Expose it via an HTTP endpoint. Maybe through something we don't have (like namespaceStats)?
- Log it - Here we it means the user needs to define a threshold for logging, and we only log the subscription / topic name once we cross the threshold - once.
@codelipenghui Your suggestion is to also define threshold but expose the topics/subscriptions via Prometheus metrics and not logs, right?
Motivation & Modifications
Start a PIP: Add metrics for redelivery_messages
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x