-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blank spaces not parsed in statsd data #10372
Comments
It looks like there was a previous issue and discussion around sanitizing the input to state. I am thinking this is probably the root cause. Could you give that issue a read and see if you agree? |
Hi Joshua, For datadog statsd format, some features were added in telegraf. so why not allow white spaces too (after all, they would be tags ) |
I am happy to see a PR that attempts to do that... but it is generally a far better practice to sanitize your input and is my preferred solution. As you said yourself, it is impossible to make changes to what others do especially when they themselves are not enforcing standards. The issue I pointed to, doesn't drop those messages, it is only escaping out or removing the whitespace. This is something Telegraf does in many other places as a matter of good practice. |
Hi @powersj, regarding your statement : " it is only escaping out or removing the whitespace" I rerun some tests and wanted to share with you the result. The problem is that telegraf is not removing the whitespace but its ignoring the real metric name that follows the white space. For example, the below mentioned data f5telemetry/server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn/dos-DeviceStats/ARP value = 10 As you can note, the real metric is "ARP flood stats" and this is not taken into account by telegraf. It rejects the words after the first "whitespace". Only the word "ARP" is taken into account. Do you think that we can make an improvement in telegraf to provide sanitization ? I checked the issues on statsd and there they are proposing to push sanitization at the backend. statsd/statsd#110 What are your thoughts? |
Per upstream, statsd does sanitization of names. This takes this same set of rules and applies it when statsd is parsing out names. Fixes: influxdata#10372
This issue helped a lot. Thank you for finding this!
It does seem better to have Telegraf do this since you are directly sending data. I have put up a PR, which will have some artifacts attached to it. Can you try using those artifacts and let me know? The PR is not final and will require adding a new configuration value to statsd to explicitly turn on the sanitization. This way we are sure we are not disrupting existing users. Let me know, what you think! |
@rudigerlove have you had a chance to try out the PR? Thanks! |
Hi @rudigerlove wanted to ping you again if you have had a chance to use the PR? |
@powersj |
@powersj : I have validated it, finally. The injected data is: The data is correctly interpreted by telegraf and as it can be noticed, "ARP flood stats" has been 'santized' by telegraf, by replacing the spaces with underscores, thereby enabling the data to be stored more appropriately.
For this test, below is the telegraf.conf that I used:
|
Thank you very much for the confirmation! Gives me even more confidence in submitting the PR. Thanks again! |
Per upstream, statsd does sanitization of names. This takes this same set of rules and applies it when statsd is parsing out names. Fixes: influxdata#10372
Per upstream, statsd does sanitization of names. This takes this same set of rules and applies it when statsd is parsing out names when the sanitize option is set to upstream. Fixes: influxdata#10372
Relevent telegraf.conf
Logs from Telegraf
ctrcoll01:/data/TEMP/SRI# ./telegraf --debug --config ./telegraf.conf
2022-01-04T13:51:07Z I! Starting Telegraf 1.19.3
2022-01-04T13:51:07Z I! Loaded inputs: statsd
2022-01-04T13:51:07Z I! Loaded aggregators:
2022-01-04T13:51:07Z I! Loaded processors:
2022-01-04T13:51:07Z I! Loaded outputs: file
2022-01-04T13:51:07Z I! Tags enabled: host=ctrlcoll01 site=Collector0X
2022-01-04T13:51:07Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ctrlcoll01", Flush Interval:10s
2022-01-04T13:51:07Z D! [agent] Initializing plugins
2022-01-04T13:51:07Z D! [agent] Connecting outputs
2022-01-04T13:51:07Z D! [agent] Attempting connection to [outputs.file]
2022-01-04T13:51:07Z D! [agent] Successfully connected to outputs.file
2022-01-04T13:51:07Z D! [agent] Starting service inputs
2022-01-04T13:51:07Z I! [inputs.statsd] TCP listening on "[::]:8182"
2022-01-04T13:51:07Z I! [inputs.statsd] Started the statsd service on ":8182"
2022-01-04T13:51:17Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2022-01-04T13:51:27Z D! [outputs.file] Wrote batch of 1 metrics in 202.862µs
2022-01-04T13:51:27Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
System info
Telegraf 1.19.3 (git: HEAD a799489)
Docker
No response
Steps to reproduce
Expected behavior
I expected the data to contain the line:" interface = ARP flood stats"
f5telemetry/dos-DeviceStats,common.status=Ready,deviceType=Static\ Vector\ -\ Inline,host=ctrlcoll01,interface=ARP flood stats,metric_type=gauge,server=server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn,site=Collector0X,vectorName=XXXRAM1\ flood common-aggDetected=0 1641307700000000000
Actual behavior
Instead, I got a truncated metric without any meaning:
f5telemetry/server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn/dos-DeviceStats/ARP,common.status=Ready,deviceType=Static\ Vector\ -\ Inline,host=ctrlcoll01,metric_type=gauge,site=Collector0X,vectorName=XXXRAM1\ flood value=0 1641307830000000000
Additional info
No response
The text was updated successfully, but these errors were encountered: