-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiled stateful expression #491
Compiled stateful expression #491
Conversation
No, the return type of a function is constant. In fact you could add a method to the Func interface so this is explicit and you can add functions to the constant case. https://github.com/influxdata/kapacitor/blob/master/tick/functions.go#L14
There are no methods or functions that use them yet to my knowledge but it is planned see #169
Can you point to a line where node.eval is being called? Not sure what you are talking about here.
Seems like there are two kinds of errors:
For case number one the error should cause the task to fail and stop executing.
That's fine we can do it later or not at all if its too much.
Yes, go ahead an fix it. Just add a note to this PR that it is fixed. Haven't looked at the code much yet but overall the process and explanations are good. Thanks! |
By the way, do you want me to squash commits or something? there is huge number of commits whom are poorly written and doesn't describe the change very well. |
You do :) I checked current implementations and non have variable return types. Since we are specializing the evaluation now we can make this part of the contract. There are three raw data types we deal with now
Yes
That is the recursive stack based eval function. Since not all expressions are guaranteed to be a BinaryNode that is the entrypoint to evaluation. If the expression is an NumberNode for example it will get pushed on the stack and then popped as the final result of the expression:
yes, its fine if the compile returns an error for the first case. Can you give an example of how it can fix a type issue at runtime? Seems like there are cases when it is not possible to fix?
In that case let's focus on the PR at hand and get a fix in for this later. It will be beneficial to have a simple PR that fixes a small issue in the new code base so others can begin to see how it works. |
Just to make sure - This PR is finished and waiting for CR feedback |
I have your branch checkout locally and I am looking through it.... |
Here is the main entry point: https://github.com/influxdata/kapacitor/blob/master/tick/stateful_expr.go#L36 and then its called recursively for example here https://github.com/influxdata/kapacitor/blob/master/tick/stateful_expr.go#L107 |
Is there any case where this new code is used? I didn't see any... That is beyond the test and benchmarks? |
Here is a brain dump of some first thoughts after digging through the code. These are just observations or questions not suggestion for change yet. I haven't dug in deep enough to understand it all so I don't want to suggest any changes yet, but I do want to start a discussion.
Well there a brain dump of some of my initial thoughts. Again no need to go change everything yet it will take me a bit more for all of this to sink in and maybe then I'll probably change my mind about half of those thoughts anyway. |
Hi, Before answering your question, I see there is confusion that there two phases in the evaluation: "compile" and "eval". and the confusion might be as a result of the function named "TryCompileStatefulExpression" while there is no compilation. In compilers, generally speaking, you have two types:
In this branch, we implement the latter, "JIT" compiler (and I am really careful about talking about JIT/AOT compilers here). In order to implement "AOT" you must do some static analysis, and currently we don't have much power to do static analysis on dynamic nodes. and let's dissect this statement:
The only "AOT" compilation we do is with two static nodes - https://github.com/yosiat/kapacitor/blob/compiled-stateful-expression/tick/eval/specialized_binary_stateful_expr.go#L90 In conclusion, we do compilation in the evaluation time and this is done using "evaluateDynamicNode" and "evalWithNodes", so there is separate phase - they are fused together.
I used expression as name there because we are holding StatefulExpression, I can change the names of ExpressionCache to one of:
We can't do for this example:
If you want to remove Reference Node changing types, you will need to remove the lines that handle ErrTypeGuardFailed, for example: https://github.com/yosiat/kapacitor/blob/compiled-stateful-expression/tick/eval/specialized_binary_stateful_expr.go#L144,L155 But, actually you can't remove it because you don't know what is the type of "value" at the "TryCompileStatefulExpression" phase, we don't have enough "static analysis" powers here to determine the types of "dynamic types" before hand.
Where do we call getConstantNodeType during eval? It will be a big change to change it at "compile time", do you have an idea how to?
I answer this above ~ |
Things are making more sense now. AOT vs JIT, makes sense that we need JIT, I guess I was thinking we would cheat and have one compilation step after the first data point and then treat it as AOT from there on out. But you make a good point lets do this right and have real JIT as that will be less confusing in the long run. I think a simple explanation in the package doc would be helpful. OK at this point I am going to start making inline comment in the code. I feel I understand the design well enough to start asking more specific questions. Wave of questions incoming... |
return NodeValueAccessor(&EvalFunctionNode{Node: node}) | ||
|
||
case *tick.UnaryNode: | ||
return NodeValueAccessor(&EvalUnaryNode{Node: node}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At each of these cases you can call getConstantNodeType
and store the result on the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way - this is great proposal! We can now change Eval{X}Node to contain data like expression cache or another info, but this for later change 👍
@nathanielc Ok, and do make it short I flag your response with "+1" that means I will fix it ASAP |
High level question: What is the API to this package? It is not clear to me, which is a problem. Is the API for the package essentially the This comes back to my confusion between I don't see a clear interface for evaluating the result of an AST of nodes. In some places we create a NodeValueAccesor directly from the node in others we create a SpecializedBinaryStatefulExpression and then evaluate. Will there be a For example the args to function calls are assumed to always be simple NodeValueAccessors. I am going to write up a quick example of how I think this will work.... |
The API of this package is the interface and "TryCompileStatefulExpression" which I want to change to "NewStatefulExpr" after we remove the old code. First, I agree, there is some confusion that we used both NodeValueAccesor and Specialized{X}StatefulExpression, I tried to think a way to factor this out. About the unary, I am currently not sure about how to implement this, it can be a "change" to SpecializedBinaryExpression to support an array of NodeValueAccesor or maybe a change to NodeValueAccesorSpecializedExpression. The args to function calls should be transformed to expression. I am waiting for your example 👍 Edit: some important thing to note that NodeValueAccesor must stay, it might be the root cause of the confusion, but it's the key abstraction for "hiding" our we access the value of a node. |
Here is my example/explanation of First we have two types A Here are the proposed interfaces for each type. // Evaluate an expression for a given scope.
type StatefulExpression interface {
EvalFloat64(scope *tick.Scope) (float64, error)
EvalInt64(scope *tick.Scope) (int64, error)
EvalString(scope *tick.Scope) (string, error)
EvalBool(scope *tick.Scope) (bool, error)
EvalRegex(scope *tick.Scope) (*regexp.Regexp, error)
}
// Evaluate an node for a given scope and state
type NodeValueAccessor interface {
EvalFloat64(scope *tick.Scope, state *ExecutionState) (float64, error)
EvalInt64(scope *tick.Scope, state *ExecutionState) (int64, error)
EvalString(scope *tick.Scope, state *ExecutionState) (string, error)
EvalBool(scope *tick.Scope, state *ExecutionState) (bool, error)
EvalRegex(scope *tick.Scope, state *ExecutionState) (*regexp.Regexp, error)
}
The only different between a The implementations of the Eval{} methods for the type SE struct {
nva NodeValueAccessor
state *ExecutionState
}
func (se *SE) EvalFloat64(scope *tick.Scope) (float64,error) {
return se.nva.EvalFloat64(scope, se.state)
}
func (se *SE) EvalInt64(scope *tick.Scope) (int64,error) {
return se.nva.EvalInt64(scope, se.state)
}
func (se *SE) EvalBool(scope *tick.Scope) (bool,error) {
return se.nva.EvalBool(scope, se.state)
}
func (se *SE) EvalString(scope *tick.Scope) (string,error) {
return se.nva.EvalString(scope, se.state)
}
func (se *SE) EvalRegex(scope *tick.Scope) (*regexp.Regexp,error) {
return se.nva.EvalRegex(scope, se.state)
} The Then thins start to simplify a bit since there aren't any I may have missed something critical that makes this hard to do but lets discuss. |
@nathanielc WOW 👍 👍 When I changed EvalBinaryNode to be next to it's implementation and it actually made me think.. wait.. now we can add to EvalBinaryNode struct more info like Type and cache the constant type (as you suggested) type EvalBinaryNode struct {
Node *tick.BinaryNode
Type ValueType
}
func CreateEvalBinaryNode(binaryNode *tick.BinaryNode) NodeValueAccessor {
return &EvalBinaryNode{
Node: binaryNode,
Type: getConstantNodeType(binaryNode),
}
} Why not simply return an error - for example: node type is InvalidType and etc (as we called them "compile error type") Just one question - the current StatefulExpression allows to evaluate to bool or number, do you see any reason in the future of kapacitor that you would like to return regex/string/etc? Here are the tasks:
|
Please give me your ok and I will start those tasks ASAP |
@yosiat Looks good! I don't see a need for an
InfluxDB supports four data types |
@nathanielc I meant in Kapacitor nodes (StreamNode, FromNode, etc) do you any need for evaluating expression to string? |
@yosiat Technically yes, but there isn't a lot of support for it yet(aka string manipulation functions). I'd add it as it will help flesh out the rest of the interface better. |
@nathanielc just found out that you are missing generic value return function to NodeValueAccesor interface. It's important for the "dynamic evaluation", so we can call it and detect the type if it's dynamic node. |
@nathanielc By the way, why you are passing reference to ExecutionState and not value reference? EvalFloat64(scope *tick.Scope, state *ExecutionState) (float64, error) Do: EvalFloat64(scope *tick.Scope, state ExecutionState) (float64, error) Is there any special reason? |
@nathanielc quick status update:
Now I am starting on - |
No reason... Sounds good. Let me know when you want me to take a second look. |
@nathanielc I started to convert all to use the new stateful expression, But one of the tests are failing:
Do you have any idea? (do you want me to push the commit for it?) |
29eedc0
to
db583b8
Compare
@yosiat Sigma is a stateful function. If its state if somehow getting corrupted than that would fail the test. That is my first guess anyways. |
@nathanielc It looks like the first value - 97.1 is passed twice, I am looking into it. |
@nathanielc I found the reason for the error - EvalNum calls both EvalInt and EvalFloat if the result is float.. This is will be an easy fix, I am fixing it (and adding test of course) and pushing a commit. |
@nathanielc The problem is in here: Since we evaluating twice we are destroying the state, so I am changing it to use GetType. sorry, we won't have tnumeric |
@nathanielc do you know this is failing?
On my computer "./build.py --build" finishes successfully . |
@yosiat I am seeing this locally after running
|
f79d097
to
e8656f8
Compare
@nathanielc Fixed it! why it isn't written in circleci output? |
Not sure, maybe stderr got discarded or something. On Fri, Apr 29, 2016 at 1:23 PM Yosi Attias [email protected]
|
@nathanielc Ok, once the tests will pass on circleci, I think we are ready to merge it. By the way, here is the final benchmarks: Alert (compared to master)
Stateful expression benchmarks (compared to master):
Stateful expression benchamrks (compared to time of submiting PR):
|
|
||
// StatefulExpression is interface that describe expression with state and | ||
// it's evaluation. | ||
type StatefulExpression interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we named the package stateful
we should change this to Expression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to rename the file as well? so it will be "tick/stateful/expr.go" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, lets rename the files to match. Thanks
@yosiat LGTM 👍 Just read through everything again. Just needs a CHANGELOG entry and to fix that one comment. |
@nathanielc Fixed the comment, This changelog entry is ok?
By the way, aren't we at 0.14? and 0.13 is released? |
@yosiat Changelog entry looks good. 0.13 is not yet released, so this will make it in. |
d646d0e
to
107aa66
Compare
Creating specialized stateful expression in order to improve performance and code clarity. Speicalized stateful expression eliminates all heap allocations and most of interface{} conversions. The idea behind "specialized stateful expression" is to implement (very very simple) "JIT Compiler" for stateful expression. For example: given this expression ``"value" > 8.0``, at runtime we will try to find the type of "value", once we find the type (for this example, let's say it's float64) we will comparison functions which accepts float64 both on the right and left side. Contiuing with our example, let's say that on runtime after evaluating multiple times - "value" changed is type to int64, if that happens - we will have "type guard" that will raise an error and say this is not "float64 > float64" it's "int64 > float64" and we will adjust the evaluation function. There are more changes, those are the bottom line: * Tests - lots of tests added, at the time of writing this commit we have 77.8% coverage for "stateful" package. * Compile time errors - simple compile time errors that will stop the tasks starting if there is some simple mistakes. Finally, benchmarks: * Compared against the interepeted one - "NewStatefulExpr" * Ran with count=5 on Macbook Pro 13" Late 2011 (i5, 8GB RAM, 120GB SSD) name me old time/op new time/op delta _EvalBool_OneOperator_UnaryNode_BoolNode-4 252ns ± 2% 16ns ± 3% -93.53% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberFloat64_NumberFloat64-4 540ns ± 2% 40ns ± 2% -92.58% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberFloat64_NumberInt64-4 550ns ± 3% 40ns ± 3% -92.68% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberInt64_NumberInt64-4 539ns ± 2% 39ns ± 4% -92.77% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_NumberFloat64-4 524ns ± 3% 74ns ± 5% -85.93% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_NumberInt64-4 526ns ± 1% 76ns ± 3% -85.51% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_ReferenceNodeFloat64-4 495ns ± 3% 116ns ± 4% -76.48% (p=0.008 n=5+5) _EvalBool_OneOperatorWith11ScopeItem_ReferenceNodeFloat64_NumberFloat64-4 534ns ± 3% 90ns ± 4% -83.11% (p=0.008 n=5+5) _EvalBool_OneOperatorValueChanges_ReferenceNodeFloat64_NumberFloat64-4 2.98µs ± 1% 1.24µs ± 3% -58.44% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeInt64_ReferenceNodeInt64-4 503ns ± 3% 119ns ± 7% -76.25% (p=0.008 n=5+5) _EvalBool_OneOperatorWith11ScopeItem_ReferenceNodeInt64_NumberInt64-4 533ns ± 1% 87ns ± 3% -83.69% (p=0.008 n=5+5) _EvalBool_OneOperatorValueChanges_ReferenceNodeInt64_NumberInt64-4 3.08µs ± 4% 1.26µs ± 3% -59.22% (p=0.008 n=5+5) name old alloc/op new alloc/op delta _EvalBool_OneOperator_UnaryNode_BoolNode-4 18.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberFloat64_NumberFloat64-4 72.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberFloat64_NumberInt64-4 72.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberInt64_NumberInt64-4 72.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_NumberFloat64-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_NumberInt64-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_ReferenceNodeFloat64-4 49.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorWith11ScopeItem_ReferenceNodeFloat64_NumberFloat64-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorValueChanges_ReferenceNodeFloat64_NumberFloat64-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeInt64_ReferenceNodeInt64-4 49.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorWith11ScopeItem_ReferenceNodeInt64_NumberInt64-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorValueChanges_ReferenceNodeInt64_NumberInt64-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _EvalBool_OneOperator_UnaryNode_BoolNode-4 3.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberFloat64_NumberFloat64-4 5.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberFloat64_NumberInt64-4 5.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_NumberInt64_NumberInt64-4 5.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_NumberFloat64-4 4.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_NumberInt64-4 4.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeFloat64_ReferenceNodeFloat64-4 3.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorWith11ScopeItem_ReferenceNodeFloat64_NumberFloat64-4 4.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorValueChanges_ReferenceNodeFloat64_NumberFloat64-4 4.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperator_ReferenceNodeInt64_ReferenceNodeInt64-4 3.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorWith11ScopeItem_ReferenceNodeInt64_NumberInt64-4 4.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5) _EvalBool_OneOperatorValueChanges_ReferenceNodeInt64_NumberInt64-4 4.00 ± 0% 0.00 ±NaN% -100.00% (p=0.008 n=5+5)
107aa66
to
7e536e9
Compare
@nathanielc I am confused with the versions because I am running the nightly version =X Finished with changes & squashed. |
@yosiat Wonderful, merging on green... |
@nathanielc @yosiat this is awesome - thanks for all the hard work! |
Yay 🎉 |
Hi,
This is a one big pull request with the next bottom line changes:
Implementation
Those are explanations of the core algorithm, if there are more questions/clarifications requested, I will update this.
Basic explanation
The overall idea: Instead of using stack-based AST interpreter compiled the expressions to specialized functions.
For example: given this expression
"value" > 8.0
, let's assume two assumptions:The specializer will take this expression and will evantually run float64 > float64 all the time, instead of doing for every evaluation:
Deeper explanation
First, let's set up simple terminology:
When we get a BinaryNode we determine if it's dynamic or constant - let's examine the dynamic case.
If this is dynamic node in the constructor (NewStatefulExpr) we set the evaluation function to be "dynamic evaluation function" otherwise
we fetch the matching evaluation function based on the nodes types and their operator.
The dynamic evaluation function is doing the next instructions (this were the "specialization" happens):
The real meat is in EvalBool/EvalNum:
It's important to say: that we handle single nodes as well for example: EvalBool(BoolNode), etc.
Performance
I ran the tests on MacBook Pro (13-inch, Late 2011) - i5 2.4ghz, 8GB RAM and 128GB SSD.
The tests ran with the flag of "--count=5" and compared using benchstat.
EvalBool Benchmarks
AlertTask benchmarks
Questions / Notes
Tests
I added more tests to stateful expression, to make sure we cover more and more cases.
The coverage for eval package is now 73.5%.
I added those tests:
Important
@nathanielc / pull request reviewer, please read those very carefully and answer them please!
The notes/questions are ordered by importance:
5.@nathanielc - you requested to separate to packages as ast and etc, I didn't do this in this pull request, because it's too much big PR
6. I can fix #490 pretty easily, do you want to?
Nice-To-Haves
Those are nice to haves, maybe in this pull request and maybe another:
Fee, I finished 👍
That was a really fun and educating experience, thanks @nathanielc for being open to changes :)