-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape prometheus core metric label values #9656
Escape prometheus core metric label values #9656
Conversation
hm, the new test case consistently fails for me locally, but seems like it passed in the CI. Maybe I should start from a clean repo. |
@gomoripeti when in doubt, run bazel clean
killall -9 erl; killall -9 epmd; killall -9 beam.smp You may have stray nodes running in the background from the earlier (likely failing at first) runs. |
@tvhong-amazon can you please have a look, does this escape label values the way you'd expect? Thank you. |
407f8af
to
3c87f94
Compare
I forgot to add the new test group to I would like to get help on the test case. Although |
Thank you for the quick turn around on this! The escaped labels look correct to me. However, I see that there are other
I think we should also escape Username as it can include special characters. As for the other functions and variables, are we positive that they don't need to be escaped? |
@tvhong-amazon agreed, since usernames can be generated in RabbitMQ-aaS environments. Connection names would be another obvious candidate. |
thank you for the review and feedback. I thoroughly reviewed the label function again
So to sum up, I think all cases, where necessary, are escaped |
@gomoripeti Thank you for double checking. What you point out here is quite interesting:
If this is the case, would it make sense to change all other methods to return a key-value list instead of a binary? That way, we don't need to duplicate the label escaping logic in RabbitMQ. |
Pre-rendering was an intentional performance optimisation in #3587 I just see this note from the PR which was a wrong assumption
But it is true if the escaping function would be exported by the prometheus lib, then we could avoid some code duplication. (I just wanted to quickly submit a fix, but that would be the proper way) |
@@ -448,6 +450,25 @@ label(A) when is_atom(A) -> | |||
is_protocol(P) -> | |||
lists:member(P, [amqp091, amqp10, mqtt, http]). | |||
|
|||
%% Escape functions taken from prometheus_text_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you copied the functions instead of just calling the ones in prometheus_text_format? I think they are exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately only exported when compiled for tests https://github.com/deadtrickster/prometheus.erl/blob/master/src/formats/prometheus_text_format.erl#L27
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are welcome to contribute to prometheus.erl
, it is maintained by a RabbitMQ core team member
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was poking around, decided to create relevant PR in prometheus.erl
to move that export out of the TEST def -- PR here in deadtrickster/prometheus.erl#158
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! will update this PR when that is merged
The reasoning behind removing escaping was that via AMQP 0.9.1 it's not possible to create queue/exchange/... that needs escaping, and this was one of the worst performance offenders. So removing it seemed reasonable. You'll need to check that performance (and memory usage) is still acceptable with escaping restored - I suggest measuring the time it takes to generate stats on a broker with 10000 of exchanges/queues/channels each. |
thanks @binarin for the historic context. I assumed the intermediate record representation was the main concern. |
I did some benchmarks of escaping. The baseline is RabbitMQ 3.12.6 unchanged. The "escape" version is using the unchanged I first did a quick microbenchmark of directly calling Then I made the following benchmark: Created 1 connection with 10K channels. Each creating 1 exchange, 1 queue, subscribed to the queue and published 1 message to the exchange. Queue and exchange names were in the form Object counts: Used below command to query prometheus metrics and measure duration:
Number of different labels in the output:
Measured 5 times the total time of the HTTP call as well as the function call Results
|
3c87f94
to
c283744
Compare
375b12f
to
2a7a2ca
Compare
2a7a2ca
to
d236e69
Compare
tagged prometheus.erl 4.11 |
d236e69
to
4d8e5d2
Compare
thanks a lot, Iliia! |
For example special characters like double quotes are allowed in queue names, in which case detailed metrics could produce unparsable text format output.
4d8e5d2
to
8c78760
Compare
this might have been forgotten (I did forget about it) this PR is ready for review and merge. |
@tvhong-amazon @SimonUnge @illotum this is a PR that seeks to address an issue your team has reported. Can you please give it a shot? |
I ran several (naive) benchmarks and manual tests. LGTM. 1000 bad queue names:
|
Escape prometheus core metric label values (backport #9656)
(cherry picked from commit 8224673)
Proposed Changes
For example special characters like double quotes are allowed in queue
names, in which case detailed metrics could produce unparsable text
format output.
Fixes #9648
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
I would like to get help on the test case. Although
{collect_statistics, fine}
is set ininit_per_group
still there are no non-coarse metrics, for example the tablechannel_exchange_metrics
is empty. I must be missing something?adding escaping probably will have a slight performance impact. But I believe the main reason for using preformatted labels is to avoid the intermediate
#LabelPair
record representation, not to skip escaping.