-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lens] Add derivative function #61775
Comments
Pinging @elastic/kibana-app (Team:KibanaApp) |
The definition of derivative for our purposes is a function which subtracts sequential values in a date histogram to calculate the instant diff between sequential values. Derivatives in this context are discrete, as in non-continuous, and may have gaps. Because the date histogram has a duration in time, the derivative function supports scaling values to a specific time interval, such as "derivative per second". Derivative values can be positive or negative. User inputsThe derivative function requires a group by columns parameter, but this can be automatically set by Lens. This makes the derivative function only have optional inputs. Optional inputs are:
This leads to a function signature like: interface DerivativeArgs {
// Table is used to determine the time units
table: KibanaDatatable;
// Required list of group by columns. Don't group the time field
groupByIds: string[];
scaleTo?: 'ms' | 's' | 'h' | 'd' | 'M';
gapPolicy?: 'skip' | 'insert_zeros';
}
type DerivativeFunction = (input: DerivativeArgs) => KibanaDatatable; Form designThis form is missing a way to set a "gap policy", but is otherwise close: Table example with gap skipping (default)
Table example with zeroes
As you can see in this example, the value goes negative if there is missing data. I find this behavior a little annoying, so I consider it up for debate whether it should go negative or return to 0 for missing data. Example visualizationsDerivatives can only be used in XY charts and data tables. They can't be rendered in pie charts because the values can go negative. The simplest way to render a derivative is as a line chart. Derivatives can be calculated on a single line or as many lines indicating categories: But because we are trying to not do the bare minimum in Lens, we should also consider the most frequent requests that users have. For example, a common request is to have "red and green" colors to indicate derivatives, with a black color to indicate the underlying values. Here's an example I did in TSVB which required a lot of manual setup. Can Lens make this easy? Going even beyond this, @monfera has worked on examples of derivatives where the derivative is shown as a cumulative derivative, also known as a waterfall chart. This chart type also uses red and green coloring, and shows negative values in the context of the overall trendline. Another feature of waterfall charts is that we can apply them as annotations on top of bar charts. Implementation notesThe derivative function should be implemented as part of the standard library of expression functions, instead of using the aggregation features of Elasticsearch. This gives us the ability to compose more functions on top of the derivative. For example, the "time scaling" feature might actually be implemented as a separate expression function, making derivative a combination of two expression functions. I don't consider the red/green styling or waterfall charts to be requirements for shipping a derivative feature. When we choose to implement this feature, it should be done as a chart styling option that might be applied automatically for derivatives, but that can also be applied to any data that goes positive and negative. Steps to implement:
Stretch goals are:
|
@AlonaNadler Does the "gap policy" phrasing make sense as shown here? Do you have a better way to describe the use case for zeroing out the charts? We definitely need to implement this in code, but I noticed that we don't support this in TSVB or Visualize. Also, I've listed some stretch goals of supporting red/green styling and waterfall charts, as shown above. Do you agree, or do you think these are required for Lens by default? |
Can we address the gap policy as part of the fitting function? solving the gap policy is not a high priority in my opinion |
@AlonaNadler I agree that gap policy is not a high priority. If we decide to do it, it will be completely separate from the fitting function for technical reasons. |
The function signature I proposed earlier not complete for several reasons, and this is an attempt to update the proposed signature.
Generating a new column requires us to have a new column ID, human-readable name, and formatHint. In total, I think this is new new interface for the derivative expression function:
My confidence level in this signature is higher than before because I wrote an actual expression function with these arguments, but it could change again if we run into consistency issues with the other time series functions. |
This looks almost good to me, thanks for taking these things into account. While thinking a little about it I came up with some light additional touches (but I suspect we will continue iterating on this while actually implementing):
(basically making output column and groupBy configuration optional) The behavior would be as follows:
|
A naming suggestion as they are important for UX and even DX. Would it be possible to change the working term "derivative" to "differences" in the UI? I may overlook a good reason for calling it derivative. We don't have continuous functions, our binned time series arenn't differentiable, and even if we disregard the lack of continuity it's not some kind of tangent at the point, and not even a ratio of "Derivative" has some looser meaning too (=stuff you compute from other things, derived information) but it's not an ideal fit either, eg. it's too specific for that. |
I guess we inherited this from Elasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-derivative-aggregation.html), so one point in favor of "derivative" is familiarity for users used to the stack. But we are not following Elasticsearch terminology in a lot of places so it's totally valid to rethink this. cc @gchaps maybe you have another idea. Naming is something we can iterate on separately from the functionality (as long as it's happening between releases of course) |
Calculation of differences: Subtracting the value of the previous bin from the value of the current bin leads to accumulation of minuscule errors, which may or may not matter(*), could be decided upfront, though the implementation and runtime cost is the same. If it matters, a robust way for computing deltas is to run through the series bin by bin, and compute the difference between (A) the cumulative sum of the differences computed already, and (B) the current value. The new difference will be the subtraction of the cumulated sum ( (*) It might matter when differences are eg. reintegrated downstream over an interval; with the robust method you know what It may also matter when eg. an ES payload is sent, for compression, in a delta-encoded way; eg. hourly temperature values won't wildly differ from each other, so it's more efficient to send serial deltas down the network for somewhat continuous phenomena, or when there are long stretches of unchanged values (delta=0, compresses well with RLE) Btw. an alternative to the name "differences" would be "deltas" (or singular forms, or variation eg. series delta) |
Thanks @flash1293 - some earlier discussions eg. with Raya and Vijay touched on the tradeoffs of using the industry standard terminology, or using the term as used by Elasticsearch, if they differ. Not sure if the product design principles for Lens design made it fall one way or another, or somewhat accidental. For example, there may be some decision document that voted in favor of "bucket" instead of the more standard term "bin". Again, who knows, there may be a good reason for calling it differentiation, besides momentum or accident |
I lean toward using "difference" or "delta" because they are easier to understand at a glance. |
As we use the quite precise "cumulative sum" and not "integration" elsewhere, consistency is another support for using differences, deltas or running deltas or some such here |
Closed by #84384 |
Add a derivative pipeline aggregation to Lens. See #56696 for more discussion.
Tasks:
The text was updated successfully, but these errors were encountered: