Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blank spaces not parsed in statsd data #10372

Closed
rudigerlove opened this issue Jan 4, 2022 · 10 comments · Fixed by #10466
Closed

blank spaces not parsed in statsd data #10372

rudigerlove opened this issue Jan 4, 2022 · 10 comments · Fixed by #10466
Labels
area/graphite bug unexpected problem or unintended behavior

Comments

@rudigerlove
Copy link

Relevent telegraf.conf

[[inputs.statsd]]
  protocol = "tcp"
  service_address = ":8182"
  delete_gauges = true
  ## Reset counters every interval (default=true)
  delete_counters = true
  ## Reset sets every interval (default=true)
  delete_sets = true
  ## Reset timings & histograms every interval (default=true)
  delete_timings = true
  

  allowed_pending_messages = 1000000
  ## Percentiles to calculate for timing & histogram stats.
  percentiles = [50.0, 90.0, 99.0, 99.9, 99.95, 100.0]

  ## separator to use between elements of a statsd metric
  metric_separator = "/"
  separator = "/"
  datadog_extensions = true
  #added graphite to check if it changes something but still not parsed
  data_format = "graphite" 


  templates = [
    "*.*.*.*.*  measurement.server.measurement.interface.field*"
  ]

Logs from Telegraf

ctrcoll01:/data/TEMP/SRI# ./telegraf --debug --config ./telegraf.conf
2022-01-04T13:51:07Z I! Starting Telegraf 1.19.3
2022-01-04T13:51:07Z I! Loaded inputs: statsd
2022-01-04T13:51:07Z I! Loaded aggregators:
2022-01-04T13:51:07Z I! Loaded processors:
2022-01-04T13:51:07Z I! Loaded outputs: file
2022-01-04T13:51:07Z I! Tags enabled: host=ctrlcoll01 site=Collector0X
2022-01-04T13:51:07Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ctrlcoll01", Flush Interval:10s
2022-01-04T13:51:07Z D! [agent] Initializing plugins
2022-01-04T13:51:07Z D! [agent] Connecting outputs
2022-01-04T13:51:07Z D! [agent] Attempting connection to [outputs.file]
2022-01-04T13:51:07Z D! [agent] Successfully connected to outputs.file
2022-01-04T13:51:07Z D! [agent] Starting service inputs
2022-01-04T13:51:07Z I! [inputs.statsd] TCP listening on "[::]:8182"
2022-01-04T13:51:07Z I! [inputs.statsd] Started the statsd service on ":8182"
2022-01-04T13:51:17Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2022-01-04T13:51:27Z D! [outputs.file] Wrote batch of 1 metrics in 202.862µs
2022-01-04T13:51:27Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics

System info

Telegraf 1.19.3 (git: HEAD a799489)

Docker

No response

Steps to reproduce

  1. injected the following data for telegraf to consume (one of the elements in the metric tree has spaces : ARP flood stats )
echo "f5telemetry.server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn.dos-DeviceStats.ARP flood stats.common-aggDetected:0|g|#common.status:Ready,deviceType:Static Vector - Inline,vectorName:XXXRAM1 flood" | nc -C -w 1 localhost 8182
  1. Checked the results in the dump file It was not correct:
ctrcoll01:/data/TEMP/SRI# cat dump.2022-01-04-1641304717.log 
f5telemetry/server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn/dos-DeviceStats/ARP,common.status=Ready,deviceType=Static\ Vector\ -\ Inline,host=ctrlcoll01,metric_type=gauge,site=Collector0X,vectorName=XXXRAM1\ flood value=0 1641304710000000000

Expected behavior

I expected the data to contain the line:" interface = ARP flood stats"

f5telemetry/dos-DeviceStats,common.status=Ready,deviceType=Static\ Vector\ -\ Inline,host=ctrlcoll01,interface=ARP flood stats,metric_type=gauge,server=server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn,site=Collector0X,vectorName=XXXRAM1\ flood common-aggDetected=0 1641307700000000000

Actual behavior

Instead, I got a truncated metric without any meaning:

f5telemetry/server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn/dos-DeviceStats/ARP,common.status=Ready,deviceType=Static\ Vector\ -\ Inline,host=ctrlcoll01,metric_type=gauge,site=Collector0X,vectorName=XXXRAM1\ flood value=0 1641307830000000000

Additional info

No response

@rudigerlove rudigerlove added the bug unexpected problem or unintended behavior label Jan 4, 2022
@powersj
Copy link
Contributor

powersj commented Jan 4, 2022

It looks like there was a previous issue and discussion around sanitizing the input to state. I am thinking this is probably the root cause. Could you give that issue a read and see if you agree?

@rudigerlove
Copy link
Author

Hi Joshua,
Thanks for your reply. I read the issue before actually posting this bug issue. Many a time, we are faced with equipement manufactuers (I work in telecom) that dont fully respect standards. Its quite difficult or even impossible to be able to change (sanitize) things at the source. telegraf, is like a swiss knife. In my opinion, it's a pity that we cannot handle "spaces" in the telegraf. Atleast we must be able to apply some modifications to the metric names at the server level (telegraf), using a sanitization feature, for example, in the telegraf config.

For datadog statsd format, some features were added in telegraf. so why not allow white spaces too (after all, they would be tags )

@powersj
Copy link
Contributor

powersj commented Jan 4, 2022

For datadog statsd format, some features were added in telegraf. so why not allow white spaces too (after all, they would be tags )

I am happy to see a PR that attempts to do that... but it is generally a far better practice to sanitize your input and is my preferred solution.

As you said yourself, it is impossible to make changes to what others do especially when they themselves are not enforcing standards. The issue I pointed to, doesn't drop those messages, it is only escaping out or removing the whitespace. This is something Telegraf does in many other places as a matter of good practice.

@rudigerlove
Copy link
Author

rudigerlove commented Jan 19, 2022

Hi @powersj, regarding your statement : " it is only escaping out or removing the whitespace" I rerun some tests and wanted to share with you the result. The problem is that telegraf is not removing the whitespace but its ignoring the real metric name that follows the white space.

For example, the below mentioned data
"f5telemetry.server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn.dos-DeviceStats.ARP flood stats.common-aggDetected:10" results in

f5telemetry/server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn/dos-DeviceStats/ARP value = 10

As you can note, the real metric is "ARP flood stats" and this is not taken into account by telegraf. It rejects the words after the first "whitespace". Only the word "ARP" is taken into account.

Do you think that we can make an improvement in telegraf to provide sanitization ?

I checked the issues on statsd and there they are proposing to push sanitization at the backend. statsd/statsd#110

What are your thoughts?

powersj added a commit to powersj/telegraf that referenced this issue Jan 19, 2022
Per upstream, statsd does sanitization of names. This takes this same
set of rules and applies it when statsd is parsing out names.

Fixes: influxdata#10372
@powersj
Copy link
Contributor

powersj commented Jan 19, 2022

I checked the issues on statsd and there they are proposing to push sanitization at the backend. statsd/statsd#110

This issue helped a lot. Thank you for finding this!

Do you think that we can make an improvement in telegraf to provide sanitization ?

It does seem better to have Telegraf do this since you are directly sending data. I have put up a PR, which will have some artifacts attached to it. Can you try using those artifacts and let me know?

The PR is not final and will require adding a new configuration value to statsd to explicitly turn on the sanitization. This way we are sure we are not disrupting existing users.

Let me know, what you think!

@powersj
Copy link
Contributor

powersj commented Jan 28, 2022

@rudigerlove have you had a chance to try out the PR?

Thanks!

@powersj
Copy link
Contributor

powersj commented Feb 15, 2022

Hi @rudigerlove wanted to ping you again if you have had a chance to use the PR?

@rudigerlove
Copy link
Author

@powersj
Sorry for the delay, will test this week and provide feedback.

@rudigerlove
Copy link
Author

@powersj : I have validated it, finally.

The injected data is:
f5telemetry.server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn.dos-DeviceStats.ARP flood stats.common-aggDetected:0|g|#common.status:Ready,deviceType:Static Vector - Inline,vectorName:XXXRAM1 flood

The data is correctly interpreted by telegraf and as it can be noticed, "ARP flood stats" has been 'santized' by telegraf, by replacing the spaces with underscores, thereby enabling the data to be stored more appropriately.

f5telemetry/server-CVv15d01-ve-service-63kgfh-ad-testbed-dcn/dos-DeviceStats/ARP_flood_stats/common-aggDetected,common.status=Ready,deviceType=Static\ Vector\ -\ Inline,host=ctrlcoll01,metric_type=gauge,site=Collector01,vectorName=XXXRAM1\ flood value=0 1645565390000000000

For this test, below is the telegraf.conf that I used:

[[inputs.statsd]]
  protocol = "tcp"
  service_address = ":8181"
  delete_gauges = true
  ## Reset counters every interval (default=true)
  delete_counters = true
  ## Reset sets every interval (default=true)
  delete_sets = true
  ## Reset timings & histograms every interval (default=true)
  delete_timings = true
  

  allowed_pending_messages = 1000000
  ## Percentiles to calculate for timing & histogram stats.
  percentiles = [50.0, 90.0, 99.0, 99.9, 99.95, 100.0]

  ## separator to use between elements of a statsd metric
  metric_separator = "/"
  separator = "/"
  datadog_extensions = true

@powersj
Copy link
Contributor

powersj commented Feb 22, 2022

Thank you very much for the confirmation! Gives me even more confidence in submitting the PR. Thanks again!

powersj added a commit to powersj/telegraf that referenced this issue Feb 23, 2022
Per upstream, statsd does sanitization of names. This takes this same
set of rules and applies it when statsd is parsing out names.

Fixes: influxdata#10372
powersj added a commit to powersj/telegraf that referenced this issue Feb 25, 2022
Per upstream, statsd does sanitization of names. This takes this same
set of rules and applies it when statsd is parsing out names when the
sanitize option is set to upstream.

Fixes: influxdata#10372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/graphite bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants