-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time weighted average/mean #7445
Comments
I think you might be able to do something similar to this using |
Hi, Fill doesn't really help as it interpolates to produce a linear time series (values at regular time intervals based on interpolated values) from a delta time series which would give incorrect readings if the changes are step changes. See the examples below. Sensors sends the following in response to an environment changes at 10:23 and 4 am: If we were to use FILL then a mean we would get: Why is this a common use case for IoT? Because having sensors send data at regular intervals wastes network bandwidth, battery life, CPU time and storage when most of the time we are only interested to capture the changes that the sensor is monitoring. Interpolation is useful when there is linear change and a sensor resolution is poor (ie. it can't capture smaller changes) or intermittent but where there is step change involved (eg. common when calculating safety exposure in an industrial situation) only a time weighted average will give the correct answer. Is this something that you would consider for a future release? The math isn't too hard but would require a new aggregate function to be added (I could help with a pull but I'm no Go expert.....) |
@deandob Why did you close this issue? I've got a couple questions about how time-weighted average should work. Namely in the following case
What should the time-weighted average be? |
Hello Everyone, Not sure if the issue creator is still interested in this feature, but we find it quite important.
But the result depends on the grouping time interval, so for the interval of time from 1 to 4 result for A will already be: (2*(2-1) + 4*(3-2) + 5*(4-3))/(4-1) = 11/3 |
There are two aspects that add more complexity to this:
An example to make this clear: The maximum valid time is 4 minutes and we want to get the weighted mean value for interval 10:20am to 10:30am. => (5*1 + 3*2 + 10*1 + 8*4 + 12*1) / (1 + 2 + 1 + 4 + 1) = 65 / 9 = 7.222... |
Hi, |
Apologies for closing this issue prematurely, it was accidental. |
@ruediger-stevens comment about duration of grouping interval is valid.
|
Hello,
I also need this done in a few cases so I clearly understand the need, but this add a lot of business logic into a simple function we cannot spare today for www.iot. I suggest a first implementation with time_weighted_average(< field >) and later on we'll think about a time_weighted_average(< field >[, <max_valid_time> ]) with no breaking changes. The default could just be 0 as infinite like limit() implementation. I match max_valid_time with "Having a maximum valid time", as for limiting the duration to the aggregation interval I does not understand completely, could you give a sample in @gpomykala proposal for a test case ? Is this a parameter for the function or a std behavior ? I'll do my best to follow implem and speed this up, I don't want to be between you and a feature, just make sure something gets out ;) We could discuss works around to handle sensor quality (also part of OPC-UA implementation). I'm not yet perfectly sure it should be build in influx. I would lovely exchange on a iot scenario cloud implem standard to handle such things ! |
Hi @ruediger-stevens max_valid_time could be an optional parameter for an average in no grouping scenario - similar as moving_average. This could be added at some in future at some point as long as it is an optional parameter. Nevertheless it goes well beyond definition of time weighted average and seems more like an application specific requirement than a general purpose function. |
For @ruediger-stevens' example I would also suggest to go for the algorithm without max_valid_time first. I also agree with @gpomykala that this is an application specific feature. If you are not storing deterministic values but value changes, it is normal that you sometimes have larger gaps. An example to make this clear: We want to get the weighted mean value (time average aggregation) for interval 10:20am to 10:30am. => (5 x 1 + 3 x 2 + 10 x 1 + 8 x 5 + 12 x 1) / (1 + 2 + 1 + 5 + 1) = 73 / 10 = 7.3 This would be the correct value I would focus on first. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions. |
There is a pending PR for this issue, why it's been closed? |
This is an important feature. |
Yeah, I agree it would be great to add more value to older/newer values on a timeseries |
This is a critical and fundamental capability of a time-series database. If anybody in the process industries is going to use InfluxDB this feature must be supported. |
We are also looking for this feature making sure we have consistent result across systems of our IoT solution. Current functionality is to dependent on continuous sampling. When would the feature be available? |
I agree with previous commenters that this is a crucial issue for IoT solutions. I've seen that Azure TimeSeries Insights does support a time weighted average. |
I agree with all requesters of the time weighted average and it's sad, that this commit did not get to the master in any way gpomykala@1b11912 I realized that Influx cannot manage correct time based averages of time series data with Especially in IoT environments, but actually in every environment where data are not pulled but pushed by a change trigger on the senders side, data arrive on a non-regular basis, and only a time based average can get the real value. This feature is really important for non-regularly sent data points. |
Proposal:
Provide an aggregate function that provides a time weighted mean. Similar to the interpolation function in OpenTSB http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html
I am new to influxdb and possibly I have not understood how this can be done, but from spending an hour researching this it looks like it is not natively possible in Influxdb and without the possibility of doing user defined functions this type of calculation has to be done in the host application, negating some of the benefits and performance of influxdb.
Use Case:
For IoT there are a lot of sensor scenarios where data isn't sent on a regular time basis, mostly where the sensors are reacting to changes (which saves battery and network bandwidth compared to sending the same unchanging data value in regular intervals). For example, temperature measurements won't change while a freezer door is closed (which is most of the time) sending no data, but will send many data points when the door is opened for a short while (getting warmer) then closed (getting colder again). A conventional mean will not represent the average temperature in the freezer as most of the sensor data ingested is from the short time the door was open so needs to be weighted by the time interval between sensor readings.
Apologies if there is already a feature request for this but I couldn't find anything similar in the GH issues log. I like the philosophy behind influxdb and the active development - so hoping this is a feature that could be added. Also surprised that this isn't a more requested feature as it should be a common use case. I don't want to use Kapacitor as my servers are Windows not Linux.
Thanks for listening.
The text was updated successfully, but these errors were encountered: