Time weighted average/mean #7445

deandob · 2016-10-11T10:55:41Z

Proposal:
Provide an aggregate function that provides a time weighted mean. Similar to the interpolation function in OpenTSB http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html

I am new to influxdb and possibly I have not understood how this can be done, but from spending an hour researching this it looks like it is not natively possible in Influxdb and without the possibility of doing user defined functions this type of calculation has to be done in the host application, negating some of the benefits and performance of influxdb.

Use Case:
For IoT there are a lot of sensor scenarios where data isn't sent on a regular time basis, mostly where the sensors are reacting to changes (which saves battery and network bandwidth compared to sending the same unchanging data value in regular intervals). For example, temperature measurements won't change while a freezer door is closed (which is most of the time) sending no data, but will send many data points when the door is opened for a short while (getting warmer) then closed (getting colder again). A conventional mean will not represent the average temperature in the freezer as most of the sensor data ingested is from the short time the door was open so needs to be weighted by the time interval between sensor readings.

Apologies if there is already a feature request for this but I couldn't find anything similar in the GH issues log. I like the philosophy behind influxdb and the active development - so hoping this is a feature that could be added. Also surprised that this isn't a more requested feature as it should be a common use case. I don't want to use Kapacitor as my servers are Windows not Linux.

Thanks for listening.

jsternberg · 2016-10-12T16:13:20Z

I think you might be able to do something similar to this using FILL(). You, unfortunately, can't use fill unless you use an aggregate, but you might be able to use moving_average along with mean and an aggregate interval to get something similar to what you want. We still haven't added a weighted moving average yet though which seems like it would be better for your purposes.

deandob · 2016-10-12T20:58:36Z

Hi,

Fill doesn't really help as it interpolates to produce a linear time series (values at regular time intervals based on interpolated values) from a delta time series which would give incorrect readings if the changes are step changes. See the examples below.

Sensors sends the following in response to an environment changes at 10:23 and 4 am:
10:20am value = 3
10:23am value = 10
10:24am value = 8
If we were to get a time weighted average between 10:20am and 10:25am the calculation is:
(3 x 3min + 10 x 1min + 8 * 1min) / 5min = 5.4

If we were to use FILL then a mean we would get:
10:20am value = 3
10:21am value = 5.33 (interpolated)
10:22am value = 7.66 (interpolated)
10:23am value = 10
10:24am value = 8
MEAN would give you the value of (3 + 5.33 + 7.66 + 10 + 8) / 5 = 6.8. Quite different.

Why is this a common use case for IoT? Because having sensors send data at regular intervals wastes network bandwidth, battery life, CPU time and storage when most of the time we are only interested to capture the changes that the sensor is monitoring. Interpolation is useful when there is linear change and a sensor resolution is poor (ie. it can't capture smaller changes) or intermittent but where there is step change involved (eg. common when calculating safety exposure in an industrial situation) only a time weighted average will give the correct answer.

Is this something that you would consider for a future release? The math isn't too hard but would require a new aggregate function to be added (I could help with a pull but I'm no Go expert.....)

desa · 2016-12-09T17:41:22Z

@deandob Why did you close this issue? I've got a couple questions about how time-weighted average should work. Namely in the following case

tag	time	value
A	1	2
B	2	3
A	2	4
A	3	5

What should the time-weighted average be?

dreamon-dd · 2017-02-22T15:08:40Z

Hello Everyone,

Not sure if the issue creator is still interested in this feature, but we find it quite important.
@desa As for how it should work, I think @deandob explained it quite detailed: mean should be calculated according to how long the value was kept. Another question is, should it be grouped by the tags or not. In my schema I would have a separate value for each sensor and no tags actually, so there the calculation is clear, but I would assume that if there are different values for different tags existing, then probably result should be grouped by the values of the tags. Therefore in forementioned case result for grouping in the interval from time 1 to time 3 will be:

tag	timeWeightedMean(value)
A	(2(2-1) + 4(3-2))/(3-1) = 3
B	3

But the result depends on the grouping time interval, so for the interval of time from 1 to 4 result for A will already be: (2*(2-1) + 4*(3-2) + 5*(4-3))/(4-1) = 11/3

rs-blade · 2017-05-11T07:16:24Z

There are two aspects that add more complexity to this:

Limiting the duration of a value to the aggregation interval
Having a maximum valid time for values (values older than that are considered as unreliable; maybe there was a loss in communication to the device).

An example to make this clear:
10:18am 5
10:21am 3
10:23am 10
10:24am 8
10:29am 12
10:32am 14

The maximum valid time is 4 minutes and we want to get the weighted mean value for interval 10:20am to 10:30am.
The result would be:
10:20am-10:21am 5 => 5 * 1min !!! This is where the interval boundary limitation kicks in !!!
10:21am-10:23am 3 => 3 * 2min
10:23am-10:24am 10 => 10 * 1min
10:24am-10:28am 8 => 8 * 4min !!! This is where the maximum validation time kicks in !!!!
10:29am-10:30am 12 => 12 * 1min !!! This is where the interval boundary limitation kicks in !!!

=> (5*1 + 3*2 + 10*1 + 8*4 + 12*1) / (1 + 2 + 1 + 4 + 1) = 65 / 9 = 7.222...

arastoo78 · 2018-06-09T13:35:12Z

Hi,
the time average aggregation is also very important for us. Does someone know if the feature has been implemented since this issue was opened? Or is it possible to achieve it somehow with the already provided API? We have a simple implementation for the time average algorithm according to the above mentioned calculation and could contribute it if necessary.

deandob · 2018-07-01T22:53:24Z

Apologies for closing this issue prematurely, it was accidental.
I'm still looking for this feature in influxdb, and about to launch a new product that could potentially use the enterprise version of influxdb if this feature was available. At this point we are looking at either implementing time weighting outside influxdb (which reduces the benefits of influxdb to us) or alternate products. As mentioned earlier this would be a common use case for using influxdb in IoT scenarios.

gpomykala · 2018-08-08T14:35:07Z

@ruediger-stevens comment about duration of grouping interval is valid.
There are 2 main issues:

how to pass interval start/end time to the reducer (10:20 - 10:30)
how to determine a value of a point at 10:18 given that the iterator will not contain this point given the grouping interval

…nfluxdata#7445

nyhu · 2018-12-03T22:19:29Z

Hello,
I obviously need this too but I "disagree" with @ruediger-stevens.

Limiting the duration of a value to the aggregation interval

Having a maximum valid time for values (values older than that are considered as unreliable; maybe there was a loss in communication to the device).

I also need this done in a few cases so I clearly understand the need, but this add a lot of business logic into a simple function we cannot spare today for www.iot.

I suggest a first implementation with time_weighted_average(< field >) and later on we'll think about a time_weighted_average(< field >[, <max_valid_time> ]) with no breaking changes. The default could just be 0 as infinite like limit() implementation.
max_valid_time is raising a lot of unanswered questions and I just suggest to postpone. (In same PR or the next).

I match max_valid_time with "Having a maximum valid time", as for limiting the duration to the aggregation interval I does not understand completely, could you give a sample in @gpomykala proposal for a test case ? Is this a parameter for the function or a std behavior ?

I'll do my best to follow implem and speed this up, I don't want to be between you and a feature, just make sure something gets out ;)

We could discuss works around to handle sensor quality (also part of OPC-UA implementation). I'm not yet perfectly sure it should be build in influx. I would lovely exchange on a iot scenario cloud implem standard to handle such things !

gpomykala · 2018-12-05T13:36:43Z

Hi @ruediger-stevens max_valid_time could be an optional parameter for an average in no grouping scenario - similar as moving_average. This could be added at some in future at some point as long as it is an optional parameter. Nevertheless it goes well beyond definition of time weighted average and seems more like an application specific requirement than a general purpose function.

arastoo78 · 2018-12-05T15:46:08Z

For @ruediger-stevens' example I would also suggest to go for the algorithm without max_valid_time first. I also agree with @gpomykala that this is an application specific feature. If you are not storing deterministic values but value changes, it is normal that you sometimes have larger gaps.

An example to make this clear:
10:18am 5
10:21am 3
10:23am 10
10:24am 8
10:29am 12
10:32am 14

We want to get the weighted mean value (time average aggregation) for interval 10:20am to 10:30am.
The result would be:
10:20am-10:21am 5 => 5 * 1min !!! This is where the interval boundary limitation kicks in !!!
10:21am-10:23am 3 => 3 * 2min
10:23am-10:24am 10 => 10 * 1min
10:24am-10:29am 8 => 8 * 5min
10:29am-10:30am 12 => 12 * 1min !!! This is where the interval boundary limitation kicks in !!!

=> (5 x 1 + 3 x 2 + 10 x 1 + 8 x 5 + 12 x 1) / (1 + 2 + 1 + 5 + 1) = 73 / 10 = 7.3

This would be the correct value I would focus on first.

stale · 2019-07-23T23:33:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-07-31T00:29:30Z

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

gpomykala · 2019-09-03T12:49:09Z

There is a pending PR for this issue, why it's been closed?

ashishkaransingh · 2019-11-08T16:35:24Z

This is an important feature.

iwita · 2020-04-07T14:09:04Z

Yeah, I agree it would be great to add more value to older/newer values on a timeseries

inselbuch · 2020-06-04T21:24:28Z

This is a critical and fundamental capability of a time-series database. If anybody in the process industries is going to use InfluxDB this feature must be supported.

Sdelausnay · 2020-08-13T12:01:19Z

We are also looking for this feature making sure we have consistent result across systems of our IoT solution. Current functionality is to dependent on continuous sampling. When would the feature be available?

fgheysels · 2020-08-26T06:39:21Z

I agree with previous commenters that this is a crucial issue for IoT solutions. I've seen that Azure TimeSeries Insights does support a time weighted average.
In TSI, when using the twavg function, you need to specify an interpolation strategy as well. This interpolation strategy defines how "irregular" intervals will be handled. The interpolation interval can be either step or linear and this defines what values should be used for the points in time for which we have no data. After the interpolation has been applied, the TWAVG is calculated on the interpolated set of data.

christianTF · 2021-10-21T08:23:21Z

I agree with all requesters of the time weighted average and it's sad, that this commit did not get to the master in any way gpomykala@1b11912

I realized that Influx cannot manage correct time based averages of time series data with GROUP BY time(2m) fill(previous).
In my opinion, the fill(previous) should exactly do what it is named, to fill up datapoints where no datapoints are, with the previous value. That would exactly be a time weighted average.
But fill(previous) only fills time spans without datapoints inside the group clause, but not all the areas between existing datapoints. The group by itself only calculates arithmetic averages of existing datapoints in the group.

Especially in IoT environments, but actually in every environment where data are not pulled but pushed by a change trigger on the senders side, data arrive on a non-regular basis, and only a time based average can get the real value.

This feature is really important for non-regularly sent data points.

jsternberg added area/functions kind/feature-request labels Oct 12, 2016

deandob closed this as completed Oct 12, 2016

desa reopened this Dec 9, 2016

beckettsean mentioned this issue Dec 10, 2016

[[feature collection]] requested Functions and query operators #5930

Open

32 tasks

dreamon-dd mentioned this issue Mar 17, 2017

Add "integral" function to InfluxQL #7591

Closed

3 tasks

gpomykala pushed a commit to gpomykala/influxdb that referenced this issue Nov 13, 2018

time_weighted_average parsing fixed; test case updated as defined in i…

1b11912

…nfluxdata#7445

gpomykala mentioned this issue Nov 13, 2018

time weighted average implementation #10477

Closed

7 tasks

dgnorton added the 1.x label Jan 7, 2019

stale bot added the wontfix label Jul 23, 2019

stale bot closed this as completed Jul 31, 2019

alberanid mentioned this issue Dec 12, 2022

timeWeightedAverage interpolation off? influxdata/flux#5349

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time weighted average/mean #7445

Time weighted average/mean #7445

deandob commented Oct 11, 2016

jsternberg commented Oct 12, 2016

deandob commented Oct 12, 2016

desa commented Dec 9, 2016

dreamon-dd commented Feb 22, 2017

rs-blade commented May 11, 2017 •

edited

Loading

arastoo78 commented Jun 9, 2018

deandob commented Jul 1, 2018

gpomykala commented Aug 8, 2018

nyhu commented Dec 3, 2018 •

edited

Loading

gpomykala commented Dec 5, 2018

arastoo78 commented Dec 5, 2018 •

edited

Loading

stale bot commented Jul 23, 2019

stale bot commented Jul 31, 2019

gpomykala commented Sep 3, 2019

ashishkaransingh commented Nov 8, 2019

iwita commented Apr 7, 2020

inselbuch commented Jun 4, 2020

Sdelausnay commented Aug 13, 2020

fgheysels commented Aug 26, 2020

christianTF commented Oct 21, 2021 •

edited

Loading

Time weighted average/mean #7445

Time weighted average/mean #7445

Comments

deandob commented Oct 11, 2016

jsternberg commented Oct 12, 2016

deandob commented Oct 12, 2016

desa commented Dec 9, 2016

dreamon-dd commented Feb 22, 2017

rs-blade commented May 11, 2017 • edited Loading

arastoo78 commented Jun 9, 2018

deandob commented Jul 1, 2018

gpomykala commented Aug 8, 2018

nyhu commented Dec 3, 2018 • edited Loading

gpomykala commented Dec 5, 2018

arastoo78 commented Dec 5, 2018 • edited Loading

stale bot commented Jul 23, 2019

stale bot commented Jul 31, 2019

gpomykala commented Sep 3, 2019

ashishkaransingh commented Nov 8, 2019

iwita commented Apr 7, 2020

inselbuch commented Jun 4, 2020

Sdelausnay commented Aug 13, 2020

fgheysels commented Aug 26, 2020

christianTF commented Oct 21, 2021 • edited Loading

rs-blade commented May 11, 2017 •

edited

Loading

nyhu commented Dec 3, 2018 •

edited

Loading

arastoo78 commented Dec 5, 2018 •

edited

Loading

christianTF commented Oct 21, 2021 •

edited

Loading