Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derivative on multiple fields? #6118

Closed
dmke opened this issue Mar 24, 2016 · 11 comments
Closed

Derivative on multiple fields? #6118

dmke opened this issue Mar 24, 2016 · 11 comments

Comments

@dmke
Copy link

dmke commented Mar 24, 2016

Hey,

I'm storing raw NIC byte/packet counter (among other fields) into InfluxDB 0.11 and would like to generate a plain old traffic graph, using a single query:

select
  non_negative_derivative(max("rx_bytes")) as "rx_bytes",
  non_negative_derivative(max("tx_bytes")) as "tx_bytes",
  non_negative_derivative(max("rx_pkt"))   as "rx_pkt",
  non_negative_derivative(max("tx_pkt"))   as "tx_pkt"
from "nodes"
where "id"='42' and time > now() - 7d
group by time(1h)

This however fails with

error parsing query: derivative cannot be used with other fields

Am I doing it wrong™? Is there a way around this limitation?

@jsternberg
Copy link
Contributor

There is no way to take a derivative of multiple fields at the same time. You can only use one query for each derivative.

Is it possible for you to use multiple queries to get the graph you want?

@pauldix should we consider allowing multiple derivative calls for a single query?

@pauldix
Copy link
Member

pauldix commented Mar 25, 2016

@jsternberg you definitely should be able to do derivatives on different fields in the same query. Is there a reason why it wouldn't work?

@dmke
Copy link
Author

dmke commented Mar 25, 2016

Is it possible for you to use multiple queries to get the graph you want?

Sure, but I'd figure, since influxd already iterates over the data points for the single derivative, it would be a waste of CPU cycles to do that for every derivative I'm interested in, wouldn't it?

@jsternberg
Copy link
Contributor

For derivatives over aggregates, they may have different time intervals as derivatives currently skip over nil values. Derivatives over aggregates are at least easier. Derivatives over raw fields may return some pretty random times with nil values being interspersed through the data, but it's technically easy for us to allow.

The validation is the part that bans it so it may be an artifact of the old query engine. We can lift that restriction if desired.

@dmke it's possible, but the TSM engine for InfluxDB is column based so I don't think iterating over multiple fields ends up being any worse if done as part of multiple queries. Even if it's the same field, we haven't done the optimization for duplicate fields being shared in aggregates as it's harder to do than you would think, so each reference to a field iterates over the data separately (only when an aggregate is used). Raw field queries are optimized for sharing references to each other, but there's not a lot of benefit to utilizing that.

@pauldix
Copy link
Member

pauldix commented Mar 25, 2016

If it's just a matter of lifting the validation ban, we should make sure there's a test that has multiple fields and derivatives against the raw values where some of the values are nil.

@dmke
Copy link
Author

dmke commented Mar 25, 2016

so I don't think iterating over multiple fields ends up being any worse if done as part of multiple queries

I see, thanks for the explanation.

@julian7
Copy link

julian7 commented Apr 9, 2016

There is another very likely scenario. Read/write wait times can be calculated by non_negative_derivative(max("read_times")) / non_negative_derivative(max("reads")) (from /proc/diskstats). Similarly, user / system CPU usage can be calculated by their derivatives divided by the derivative of number of ticks. Usually these values are gathered at the same time, recorded by a single write operation.

@cnelissen
Copy link

+1 Yes please. Even if there is no tangible performance benefit involved (1 query that iterates over the dataset twice vs two queries that both iterate over the dataset once) it is still better from an API standpoint. Doing two queries to get two derivatives from the same measurement is just bad.

@toddboom toddboom added this to the 0.13.0 milestone Apr 13, 2016
@toddboom
Copy link
Contributor

@jsternberg putting this one into v0.13.0 assuming it's an easy fix. let me know if that ends up not being the case.

@jsternberg jsternberg self-assigned this Apr 13, 2016
@jsternberg
Copy link
Contributor

I'm going to use this ticket for allowing derivatives to be used alone as part of multiple fields like the original request as it should be reasonably easy to make that change. The derivative math that @julian7 mentioned is a really good use case, but it's also much more complicated. The entirety of performing math on raw fields is already really complicated and I'm going to try and tackle that as a different issue just to keep this ticket sane.

The main reason derivative math ends up complicated is because we don't know the time of the points output by the derivative ahead of time since it could iterate over nil points. The math iterators already get out of sync really easily. We could feasibly interpolate values between the different derivatives, but that's the point where I think the ticket becomes more complex.

@toddboom does that sound ok?

jsternberg added a commit that referenced this issue Apr 14, 2016
This removes the previous restrictions that kept derivative as only
capable of being used in a single field and only at the top level.
This lets users determine how they want to use derivative more freely
and opens up the possibility of also using math between derivatives.

This may open up some problems when it comes to math between derivatives
as timestamps may not match correctly. That is likely a problem related
to any binary math to begin with though and can probably be ignored by
the derivatives. I'm also not sure it makes sense to perform any math
between a derivative and a difference or perform math between a
derivative and a mean.

Fixes #6118.
jsternberg added a commit that referenced this issue Apr 22, 2016
This removes the previous restrictions that kept derivative as only
capable of being used in a single field and only at the top level.
This lets users determine how they want to use derivative more freely
and opens up the possibility of also using math between derivatives.

This may open up some problems when it comes to math between derivatives
as timestamps may not match correctly. That is likely a problem related
to any binary math to begin with though and can probably be ignored by
the derivatives. I'm also not sure it makes sense to perform any math
between a derivative and a difference or perform math between a
derivative and a mean.

Fixes #6118.
@hahnjo
Copy link
Contributor

hahnjo commented Oct 15, 2016

Are there possibly some more restrictions within Chronograf? When looking at the requests, I keep seeing derivative cannot be used with other fields in a request to api/v0/visualizations/1/statements/3/text.

(Interestingly, this is not shown in the UI which only tells me This query returned no results)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants