Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(inputs.gnmi): Parsing field names incorrectly #14530

Closed
greenfox878 opened this issue Jan 5, 2024 · 18 comments · Fixed by #14553
Closed

(inputs.gnmi): Parsing field names incorrectly #14530

greenfox878 opened this issue Jan 5, 2024 · 18 comments · Fixed by #14553
Labels
bug unexpected problem or unintended behavior

Comments

@greenfox878
Copy link

Relevant telegraf.conf

[[inputs.gnmi]]
  addresses = ["gnmi_dut:xxx"]

  ## define credentials
  username = "xxxxx"
  password = "xxxxx"

  encoding = "json_ietf"
  tagexclude = ["path"]


  [inputs.gnmi.tags]
    test_tag = "test"

  # Define additional aliases to map encoding paths to measurement names
  [inputs.gnmi.aliases]
    origin = "openconfig"
    ifcounters = "/interfaces/interface/state/counters"

  [[inputs.gnmi.subscription]]
    name = "ifcounters"
    origin = "openconfig"
    path = "/interfaces/interface/state/counters"
    subscription_mode = "sample"
    sample_interval = "10s"

Logs from Telegraf

no error messages

System info

Telegraf 1.29.1, Ubuntu 22.04.3 LTS (latest updates)

Docker

No response

Steps to reproduce

  1. Download compiled telegraf release "wget https://dl.influxdata.com/telegraf/releases/telegraf-1.29.1_linux_amd64.tar.gz", untar
  2. Run with the command "./telegraf-1.29.1/usr/bin/telegraf --config ./test_telegraf_conifg.cfg --debug"

Expected behavior

Correct output. v1.28.5 or older

ifcounters,host=test_telegraf_host,name=Ethernet40,source=gnmi_dut,test_tag=test in_broadcast_pkts=0i,in_discards=0i,in_errors=0i,in_fcs_errors=0i,in_multicast_pkts=0i,in_octets=0i,in_pkts=0i,in_unicast_pkts=0i,out_broadcast_pkts=0i,out_discards=0i,out_errors=0i,out_multicast_pkts=0i,out_octets=0i,out_pkts=0i,out_unicast_pkts=0i 1704442117721474264
ifcounters,host=test_telegraf_host,name=Ethernet26,source=gnmi_dut,test_tag=test in_broadcast_pkts=0i,in_discards=0i,in_errors=0i,in_fcs_errors=0i,in_multicast_pkts=0i,in_octets=0i,in_pkts=0i,in_unicast_pkts=0i,out_broadcast_pkts=0i,out_discards=0i,out_errors=0i,out_multicast_pkts=0i,out_octets=0i,out_pkts=0i,out_unicast_pkts=0i 1704442117721474264

Actual behavior

Corrupted outptu. v1.29.1:

ifcounters,host=test_telegraf_host,name=Ethernet40,source=gnmi_dut,test_tag=test t_pkts=0i,in_errors=0i,rs=0i,in_octets=0i,in_pkts=0i,pkts=0i,st_pkts=0i,s=0i,out_errors=0i,out_octets=0i,out_pkts=0i,_pkts=0i 1704442117721474264
ifcounters,host=test_telegraf_host,name=Ethernet26,source=gnmi_dut,test_tag=test t_pkts=0i,in_errors=0i,rs=0i,in_octets=0i,in_pkts=0i,pkts=0i,st_pkts=0i,s=0i,out_errors=0i,out_octets=0i,out_pkts=0i,_pkts=0i 1704442117721474264

here we see "_pkts", "t_pkts", "st_pkts" instead of "out_pkts", "out_multicast_pkts"...

Additional info

No response

@greenfox878 greenfox878 added the bug unexpected problem or unintended behavior label Jan 5, 2024
@powersj
Copy link
Contributor

powersj commented Jan 5, 2024

Hi @greenfox878,

In order to help resolve this can I request that you do the following:

  1. Enable debug in the agent. Either via the --debug flag on the CLI or add debug = true in the [agent] section of your config.
  2. In your gnmi input plugin config, add dump_responses = true
  3. Find and provide the full message that is getting truncated. Something like this example.

With that we can dig into the logic and see what is going on.

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label Jan 5, 2024
@srebhan
Copy link
Member

srebhan commented Jan 5, 2024

@greenfox878 please use dump_responses = true instead of trace = True... ;-)

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jan 5, 2024
@powersj powersj added the waiting for response waiting for response from contributor label Jan 5, 2024
@greenfox878
Copy link
Author

Hi @powersj ,

2024-01-06T07:35:45Z D! [inputs.gnmi] Got update_1704442117721474264: {"update":{"timestamp":"1704442117721474264","prefix":{"elem":[{"name":"interfaces"},{"name":"interface","key":{"name":"Ethernet35"}},{"name":"state"},{"name":"counters"}]},"update":[{"path":{"elem":[{"name":"in-broadcast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-discards"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-errors"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-fcs-errors"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-multicast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-octets"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-unicast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-broadcast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-discards"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-errors"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-multicast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-octets"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-unicast-pkts"}]},"val":{"uintVal":"0"}}]}}

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jan 6, 2024
@powersj
Copy link
Contributor

powersj commented Jan 8, 2024

@greenfox878 thanks for that output. I am able to reproduce with our tests. Next steps are I need to understand the existing key parsing code. It appears in your case we want to take the path base of the key, but are instead, sometimes doing some sub-string logic here that is clearly wrong.

edit:

The alias path we get is openconfig:/interfaces/interface/state/counters, which in some cases is in fact shorter than the key, which is when we take the remainder of key's paths length + 1.

For example:

key:       /interfaces/interface/state/counters/in_unicast_pkts
aliasPath: openconfig:/interfaces/interface/state/counters
new key:   pkts

The alias path is looked by checking if the current path:

/interfaces/interface/state/counters/out-discards

is a subpath of:

openconfig:/interfaces/interface/state/counters

Which it is, so it returns the alias path and then does the strange substring.

I think we should be ensuring that the entire alias path exists in the string before we start doing this manipulation. I'll put up a PR once I figure out how to not break existing tests O.o

powersj added a commit to powersj/telegraf that referenced this issue Jan 9, 2024
Rather than abritrarily take the substring of key, ensure that the key
contains the alias in the first place, then do the removal. Otherwise,
always take the base path.

fixes: influxdata#14530
powersj added a commit to powersj/telegraf that referenced this issue Jan 9, 2024
Rather than abritrarily take the substring of key, ensure that the key
contains the alias in the first place, then do the removal. Otherwise,
always take the base path.

fixes: influxdata#14530
@powersj powersj changed the title Corrupted GNMI output (inputs.gnmi): Parsing field names incorrectly Jan 16, 2024
powersj added a commit to powersj/telegraf that referenced this issue Jan 18, 2024
Rather than abritrarily take the substring of key, ensure that the key
contains the alias in the first place, then do the removal. Otherwise,
always take the base path.

fixes: influxdata#14530
@powersj
Copy link
Contributor

powersj commented Jan 18, 2024

@greenfox878, @romanelloacn,

Can both of you please try the artifacts in #14581 and validate that the field names report correctly please?

If they do not please use dump_responses = true to get the lines that are not.

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label Jan 18, 2024
@romanelloacn
Copy link

romanelloacn commented Jan 18, 2024

@greenfox878, @romanelloacn,

Can both of you please try the artifacts in #14581 and validate that the field names report correctly please?

If they do not please use dump_responses = true to get the lines that are not.

Thanks!

Sorry but I have no idea how to install/update telegraf from the cloned telegraf repository of your profile

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jan 18, 2024
@powersj
Copy link
Contributor

powersj commented Jan 18, 2024

Sorry but I have no idea how to install telegraf from the cloned telegraf repository of your profile

Artifacts, pre-build binaries, will be attached to that PR in 20-30mins. No need to build anything yourself.

@powersj powersj added the waiting for response waiting for response from contributor label Jan 18, 2024
@romanelloacn
Copy link

romanelloacn commented Jan 22, 2024

Sorry for the late reply... I can confirm that is successfully working now :) I'll wait the next official telegraf update!
image

But now after reinstalling the official telegraf version, Telegraf 1.29.2 (git: HEAD@d92d7073) with "yum install telegraf", the logs give me:

2024-01-22T12:17:43Z E! [inputs.gnmi] Invalid empty path "/interfaces/interface/state/last-change" with alias "openconfig:/interfaces/interface/state"
2024-01-22T12:17:43Z E! [inputs.gnmi] Invalid empty path "/interfaces/interface/state/oper-status" with alias "openconfig:/interfaces/interface/state"

Even if with gnmic the path "/interfaces/interface/state" responds correctly and the same telegraf.conf file was used to test your binaries

image

(the Gi0/0 used only to have a limited output)

If then I execute again your pullrequest binary it works fine so it's not a telegraf.conf problem

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jan 22, 2024
@powersj
Copy link
Contributor

powersj commented Jan 22, 2024

Thank you for taking the time to try it out!

@romanelloacn
Copy link

Thank you for taking the time to try it out!

You're welcome :) Can you please help with the other issue I described or should I open a new issue in github?

@powersj
Copy link
Contributor

powersj commented Jan 22, 2024

Can you please help with the other issue I described or should I open a new issue in github?

You got the empty path error only with the current release right? The test artifact did not show that right?

@romanelloacn
Copy link

Can you please help with the other issue I described or should I open a new issue in github?

You got the empty path error only with the current release right? The test artifact did not show that right?

Correct

@powersj
Copy link
Contributor

powersj commented Jan 22, 2024

ok good :) The current code when parsing the field names does not take the origin (e.g. openconfig into account currently. So when it does the replace it effectively removes everything. My new PR will take that into account and prevent the incorrect substring as well.

@romanelloacn
Copy link

Ok thanks! So I'll wait even more patiently the new release :)

powersj added a commit to powersj/telegraf that referenced this issue Jan 22, 2024
Rather than abritrarily take the substring of key, ensure that the key
contains the alias in the first place, then do the removal. Otherwise,
always take the base path.

fixes: influxdata#14530
@powersj
Copy link
Contributor

powersj commented Jan 24, 2024

@romanelloacn, @greenfox878,

I made some additional changes to the PR. Could I request your try the artifacts one more time to confirm I have not regressed anything?

If all looks good, we can land this and release it in Monday's release.

Thank you!

@romanelloacn
Copy link

romanelloacn commented Jan 25, 2024

Hi, everything seems to be working fine!
image

I have one doubt... I see that now in the name of the measurement is included the path of the subfolders of the yang tree. This sometimes can be a problem because in influxdb UI, if you have 3 subfolders, you can't read the actual name of the measurement. A suggestion would be to not replace the "-" in the name with "_" so you can search the actual name in the top bar of influx and select the only result that comes out even if you can't read the full name.
Currently, it's not possible because if you search for "in-octets" it doesn't come out and if you search only "octets" both out and in measurements appear but you don't know wich one of the 2 is because there is the full path in front.

In short, I would have preferred the measurement's name to not have the subfolder name and only include the subfolder name when there is a conflict with another field having the same name in the parent folder.

Hope this can help :)

@greenfox878
Copy link
Author

Hi @powersj !
Sorry for delayed reply. I just tested fixed version of telegraf and confim that all looks good!
ifcounters,host=test_host,name=Ethernet29,source=test_gnmi_device,test_tag=test in_broadcast_pkts=0i,in_discards=0i,in_errors=0i,in_fcs_errors=0i,in_multicast_pkts=0i,in_octets=0i,in_pkts=0i,in_unicast_pkts=0i,out_broadcast_pkts=0i,out_discards=0i,out_errors=0i,out_multicast_pkts=0i,out_octets=0i,out_pkts=0i,out_unicast_pkts=0i 1704442117721474264

@romanelloacn , Here old topic about "-" to "_" replacement. Looks like it's default behavior.

@powersj
Copy link
Contributor

powersj commented Jan 25, 2024

Thank you both for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants