Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV parser includes csv_measurement_name as a field #7203

Closed
shantanoo-desai opened this issue Mar 19, 2020 · 5 comments · Fixed by #7572
Closed

CSV parser includes csv_measurement_name as a field #7203

shantanoo-desai opened this issue Mar 19, 2020 · 5 comments · Fixed by #7572
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@shantanoo-desai
Copy link
Contributor

shantanoo-desai commented Mar 19, 2020

Relevant telegraf.conf:

[agent]
  interval = "1m"
  round_interval = true
  metric_batch_size = 5000
  metric_buffer_limit = 5000000
  # collection_jitter = "0s"
  # flush_interval = "10s"
  # flush_jitter = "0s"
  precision = "us"
  debug = true
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = true

[[outputs.influxdb]]

  urls = ["http://localhost:8086"]
  database = "pressure"
  skip_database_creation = false
  ## Timeout for HTTP messages.
  timeout = "10s"

[[inputs.file]]

  ## Files to parse each interval.
  files = ["../datasets/data/PressureSensor1.csv"]
  
  data_format = "csv"
  csv_header_row_count = 1
  csv_skip_columns = 0
  csv_measurement_column = "measurement_name"
  csv_field_columns = ["damage", "value", "warning"]
  csv_tag_columns = ["description", "datatype"]
  csv_timestamp_column = "time"
  csv_timestamp_format = "unix_us"

System info:

Telegraf Version: Telegraf 1.13.4 (git: HEAD ffabd6b5)
OS: Windows 10

Steps to reproduce:

  1. I have a csv file as the following:

      measurement_name,description,datatype,damage,value,warning,time
      Machine,Left,pressure,130.0,203.0,230.0,1581677425059610
    
  2. On Windows within the telegraf directory:

      telegraf.exe --config .\csvconfig.conf
    

Expected behavior:

In chronograf

when the measurement Machine is clicked the fields should be:

damage, value, warning

Actual behavior:

image

Event though the fields are specifically mentioned in the array (e.g. ["damage", "value", "warning"])

Based on InfluxData's Blog about Write Points from CSV.

Additional info:

I have tried adding skip_csv_columns=1 to skip the measurment_name but that still add it as a field.

@danielnelson
Copy link
Contributor

The csv_field_columns doesn't appear to be a supported option, we actually should be producing an error when it is set. I believe we may be accepting this option as a leftover bit of code from development, but the option is not hooked up and does nothing. We ought to clean up any documentation that references it, where did you learn about this option?

I do think the plugin should exclude the csv_measurement_column and csv_timestamp_column columns from the fields. We may need to add options to preserve backwards compatiblity, or perhaps this is enough like a bug to just change.

As a workaround you should be able to use fielddrop to remove these columns:

[[inputs.file]]
  ## Files to parse each interval.
  files = ["../datasets/data/PressureSensor1.csv"]
  
  data_format = "csv"
  csv_header_row_count = 1
  csv_skip_columns = 0
  csv_measurement_column = "measurement_name"
  csv_field_columns = ["damage", "value", "warning"]
  csv_tag_columns = ["description", "datatype"]
  csv_timestamp_column = "time"
  csv_timestamp_format = "unix_us"
  fielddrop = ["measurement_name"]

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Mar 19, 2020
@shantanoo-desai
Copy link
Contributor Author

shantanoo-desai commented Mar 19, 2020

@danielnelson As a quick hack I tried fielddrop but then the measurement is turned to file as opposed to what it needs to be. However checking it now.

Also I believe I might be barking up the wrong tree. Is tail a better option for inserting historical data set in csv format?

@shantanoo-desai
Copy link
Contributor Author

shantanoo-desai commented Mar 19, 2020

Okay you are right fielddrop=["measurement_name"] worked for me.

@danielnelson
Copy link
Contributor

Is tail a better option for inserting historical data set in csv format?

Yes, you will want to set from_beginning = true, because tail will process the file once while the file input will attempt to reprocess it every interval.

One thing to keep in mind for historical data is that currently the tail plugin will attempt to process as fast as it can, and for large files it will quickly outpace the output plugin, fill the metric buffer, and metrics can be dropped.

@danielnelson danielnelson changed the title [Input][CSV] unnecessary addition of measurement name as field in Telegraf 1.13.4 CSV parser includes csv_measurement_name as a field Mar 19, 2020
@danielnelson danielnelson added this to the planned milestone Mar 27, 2020
@HarshitOnGitHub
Copy link
Contributor

@danielnelson Willing to pick this up next, can you assign me? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants