Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML input #1758

Closed
Ismael opened this issue Sep 13, 2016 · 30 comments
Closed

XML input #1758

Ismael opened this issue Sep 13, 2016 · 30 comments
Labels
area/xml feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution

Comments

@Ismael
Copy link

Ismael commented Sep 13, 2016

It would be helpful to have an XML input plugin that behaves like the httpjson plugin.
I have a service that only exposes its stats through an xml.

@jwilder jwilder added the help wanted Request for community participation, code, contribution label Sep 15, 2016
@jwilder jwilder added this to the Future Milestone milestone Sep 15, 2016
@packtman
Copy link

Just wanted to know if this is still being considered :)

@danielnelson
Copy link
Contributor

If we implement this I think we would define a new xml input data format and the service would be required to produce stats in this format. Would this work with your use cases @Ismael @dipenshah90 ?

@packtman
Copy link

Hi @danielnelson by "service would be required to produce stats in this format" you mean the output would be in XML? For my use case, I am getting the data in XML and wanted to feed this data to Influxdb (Inlfuxdb- Grafana). I can get data in JSON, but, I need to add an extra header "Accept: application/json" which gives me a parsing error. I assume the option to add these extra headers are not available in httpjson?

@Ismael
Copy link
Author

Ismael commented Mar 21, 2017

I don't have control over the XML produced by the service, so it wouldn't work for me.

@danielnelson
Copy link
Contributor

@Ismael Can you show an example of your xml format? I wonder if there is a good xml technology for parsing custom xml output to metrics.

@danielnelson
Copy link
Contributor

@dipenshah90 You can add arbitrary headers, check the plugin docs.

@Ismael
Copy link
Author

Ismael commented Mar 29, 2017

Here's an example

<WowzaStreamingEngine>
<ConnectionsCurrent>0</ConnectionsCurrent>
<ConnectionsTotal>0</ConnectionsTotal>
<ConnectionsTotalAccepted>0</ConnectionsTotalAccepted>
<ConnectionsTotalRejected>0</ConnectionsTotalRejected>
<MessagesInBytesRate>0.0</MessagesInBytesRate>
<MessagesOutBytesRate>0.0</MessagesOutBytesRate>
<VHost>
<Name>_defaultVHost_</Name>
<TimeRunning>64679.885</TimeRunning>
<ConnectionsLimit>0</ConnectionsLimit>
<ConnectionsCurrent>0</ConnectionsCurrent>
<ConnectionsTotal>0</ConnectionsTotal>
<ConnectionsTotalAccepted>0</ConnectionsTotalAccepted>
<ConnectionsTotalRejected>0</ConnectionsTotalRejected>
<MessagesInBytesRate>0.0</MessagesInBytesRate>
<MessagesOutBytesRate>0.0</MessagesOutBytesRate>
</VHost>
</WowzaStreamingEngine>

@Ismael
Copy link
Author

Ismael commented Mar 29, 2017

I'd use XPath if I wanted to get to specific fields in the xml. I believe that would be the most generic approach.

@danielnelson
Copy link
Contributor

I think that would be pretty nice. The httpjson input would be much more flexible if we had json-ptr support, I think they are pretty similar. Could you write what you imagine a config could look like?

@Ismael
Copy link
Author

Ismael commented Mar 29, 2017

Something like this?

[[inputs.xml]]
  name = "Wowza"
  
  servers = [
    "http://wowza:8086/connectioncounts"
  ]
  
  response_timeout = "5s"
  
  #For basic HTTP Auth
  user = "admin"
  password = "lala"
  
  values = {
    "ConnectionsCurrent": "/WowzaStreamingEngine/ConnectionsCurrent/text()",
    "ConnectionsTotal": "/WowzaStreamingEngine/ConnectionsTotal/text()",
  }
  

@jbergstroem
Copy link

This would be very helpful in the java enterprise world where most things speak xml.

@danielnelson
Copy link
Contributor

@Ismael looks good, one thing I'm unsure about is what we would do if someone selects more than one item. Do we even try to allow this or do we just say "you can only run xpath queries that return one item"?

@Ismael
Copy link
Author

Ismael commented May 26, 2017

If you return more than one item you're back to the problem of mapping xml to json.
For a first try I'd say returning only one element would be simpler (also, if you need more than one item you can tediously solve it)

@danielnelson
Copy link
Contributor

One thing I expect people will want is a way to take data like this:

<servers>
	<server name="foo">
		<value>0</value>
	</server>
	<server name="bar">
		<value>1</value>
	</server>
</servers>

And produce the following line protocol:

servers,name=foo value=0 1234567890123456890
servers,name=bar value=1 1234567890123456890

@jbergstroem
Copy link

@danielnelson said:
One thing I expect people will want [snip]

Convenience is great; but I'm thinking that if you're in a position to customize output you'd probably be choosing something other than XML.

At least in my case; the xml is verbose (>10k lines) and pretty horrific. I'd definitely xpath my way through it and manually build the line protocol.

@danielnelson
Copy link
Contributor

Point taken, but for each metric you would have to have a separate [[inputs.xml]] section which would perform it's own GET. I think this could be problematic for some services.

@Ismael
Copy link
Author

Ismael commented May 27, 2017

Why would you need several GETs? In my proposed config it would parse the xml once and then lookup all the xpaths.

@danielnelson
Copy link
Contributor

It only supports writing a single point, which is represented by one line of line protocol. You would need to config one plugin per point, and each one would need to do a HTTP request to get it's data.

@danielnelson
Copy link
Contributor

The easiest way to fix this would be to just scope everything with a TOML subtable. I wonder though if it's possible, and desirable, for an xpath query to return a list of pairs, which could then be used as field-keys and field-values.

@danielnelson
Copy link
Contributor

Here is an example of how that might look along with some other things from httpjson. In this example, I added an "xpath" Parser (using data_format), which would allow us to read xml from any source.

# If we write xpath as a parser, we could support both JSON and # XML along
# with all the other types in a general http plugin.
[[inputs.http]]
  # Using the standard name_override, httpjson does this incorrectly
  name_override = "wowza"

  servers = [
    "http://wowza:8086/connectioncounts"
  ]

  timeout = "5s"

  # For basic HTTP Auth; because requiring the user to base64 encode in the
  # the headers section is not very nice.
  username = "admin"
  password = "lala"

  # Include header support like httpjson
  [inputs.http.headers]
   X-Auth-Token = "my-xauth-token"
   apiVersion = "v1"

  # Standard parser selection argument.
  data_format = "xpath"

  # Imagine that there could be multiple "VHost" tags in the earlier example.
  [[inputs.http.xpath]]
    [inputs.http.xpath.tags]
      vhost_name = "(//WowzaStreamingEngine/VHost)[1]/Name/text()"

    [inputs.http.xpath.fields]
      connections_current = "(//WowzaStreamingEngine/VHost)[1]/ConnectionsCurrent/text()"
      connections_total = "(//WowzaStreamingEngine/VHost)[1]/ConnectionsTotal/text()"

  [[inputs.http.xpath]]
    [inputs.http.xpath.tags]
      vhost_name = "(//WowzaStreamingEngine/VHost)[2]/Name/text()"

    [inputs.http.xpath.fields]
      connections_current = "(//WowzaStreamingEngine/VHost)[1]/ConnectionsCurrent/text()"
      connections_total = "(//WowzaStreamingEngine/VHost)[1]/ConnectionsTotal/text()"

  # Standard SSL config
  ssl_ca = "/etc/telegraf/ca.pem"
  ssl_cert = "/etc/telegraf/cert.pem"
  ssl_key = "/etc/telegraf/key.pem"
  insecure_skip_verify = false

I think this is pretty good, but one remaining issue is what is the type of the fields? Ideally we could convert them to a boolean, string, int, or float. Is there anything in xpath that would help with this?

@Ismael
Copy link
Author

Ismael commented May 27, 2017

Is there anything in xpath that would help with this?

Not that I know of.

@danielnelson
Copy link
Contributor

Maybe we do something like this, similar to the logparser input you can have a table for tags, string, int, float, bool

    [inputs.http.xpath.tags]
      vhost_name = "(//WowzaStreamingEngine/VHost)[2]/Name/text()"

    [inputs.http.xpath.int]
      connections_total = "(//WowzaStreamingEngine/VHost)[1]/ConnectionsTotal/text()"

    [inputs.http.xpath.float]
      time_running = "(//WowzaStreamingEngine/VHost)[1]/TimeRunning/text()"

@danielnelson danielnelson removed this from the Future Milestone milestone Jun 14, 2017
@SR-G
Copy link

SR-G commented Jul 13, 2017

Very interested here in this new data format.
The proposed configuration above by @danielnelson would also suits my needs.

@danielnelson danielnelson added the feature request Requests for new plugin and for new features to existing plugins label Aug 12, 2017
@marianob85
Copy link
Contributor

#2835 - my comment

@pierrick-openIT
Copy link

Hi, some news about a beta plugin ? It wil be so helpful.

@feutl
Copy link

feutl commented May 21, 2019

I would also like to see xml parsing. For my specific situation I would like to parse the xml-api output of an home automation system I am using. this is the software installed https://github.com/hobbyquaker/XML-API

This would allow me to get data properly stored and analized using grafana but at least store it into influxdb.

@M0rdecay M0rdecay mentioned this issue Feb 1, 2020
@raintonr
Copy link

+1 for this feature. FWIW, the Loxone Miniserver (a home automation device) can be hit with a URL like...

http://192.168.x.x/dev/sys/cpu

... and returns XML like...

<?xml version="1.0" encoding="utf-8"?>
<LL control="dev/sys/cpu" value="35%" Code="200"/>

It has multiple statistics that are each accessed on their own URL (see https://www.loxone.com/enen/kb/web-services/) so for this case it would be useful to specify multiple endpoints with each endpoint returning an XML snippet that, when parsed, would yield only one of the target values.

I'm guessing this could be done with something like...

  servers = [
    "http://192.168.x.x/dev/sys/cpu",
    "http://192.168.x.x/dev/lan/txp",
    "http://192.168.x.x/dev/lan/txe",
    # etc, etc.
  ]

  values = {
    "CPU": "(LL[@control='dev/sys/cpu'])/@value",
    "lanTxPackets": "(LL[@control='dev/lan/txp'])/@value",
    "lanTxErrors": "(LL[@control='dev/lan/txe'])/@value",
    # etc, etc.
  }

Does this sound reasonable?

*Apologies if the XPath above isn't quite correct, but you get the idea.

@nsalhab
Copy link

nsalhab commented Mar 6, 2020

Hello,

I am interested in this thread, as I need to ETL an XML input of Enocean IoT sensors into the influxdb.

Related metrics are dumped as telegrams into an progressively increasing XML file as shown in following excerpt.

<?xml version="1.0" encoding="utf-8" standalone="yes"?>...

<Telegram Timestamp="2020-03-03 13:55:06.800" Direction="Incoming" Port="COM3" RORG="A5" Data="26 03 00 0F" Status="00" ID="01ABCD" dBm="-54" DestinationID="FFFFFFFF" SecurityLevel="0" SubtelegramCount="0" Tickcount="0" OptionalData="" >
  <Packet Timestamp="2020-03-03 13:55:06.800" Direction="Incoming" Port="COM3" Type="01" Data="A5 26 03 00 0F 01 A5 E5 85 00" OptionalData="00 FF FF FF FF 36 00" />
</Telegram>

What do you suggest?

@sjwang90
Copy link
Contributor

Take a look and test the following PRs for XML parser solutions. Please give feedback on the actual PRs themselves.
#8047
#8121

@sjwang90
Copy link
Contributor

sjwang90 commented Mar 4, 2021

Closed in #8931

@sjwang90 sjwang90 closed this as completed Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/xml feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution
Projects
None yet
Development

Successfully merging a pull request may close this issue.