-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Average aggregate support #383
Conversation
8bdb1c8
to
6d7609c
Compare
This comment was marked as spam.
This comment was marked as spam.
This can be shared between numeric aggregates
Can be shared between most aggregates
6d7609c
to
904fc49
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
query/graphql/parser/query.go
Outdated
@@ -433,7 +436,7 @@ func parseSelectFields(root SelectionType, fields *ast.SelectionSet) ([]Selectio | |||
|
|||
// parseField simply parses the Name/Alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// parseField simply parses the Name/Alias | |
// ParseField simply parses the Name/Alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers, will change
- Update ParseField comment
if f.Statement.Name.Value == name { | ||
allArguementsMatch := true | ||
|
||
for _, possibleMatchingArguement := range f.Statement.Arguments { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - cheers for raising
- document tryGetAggregateField
query/graphql/planner/select.go
Outdated
|
||
switch thisTypedValue := thisValue.GetValue().(type) { | ||
case *ast.Variable, *ast.IntValue, *ast.FloatValue, *ast.StringValue, *ast.EnumValue, *ast.BooleanValue: | ||
if thisTypedValue != otherValue.GetValue().(*ast.StringValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only compare *ast.StringValue
what about the comparison of *ast.Variable, *ast.IntValue, *ast.FloatValue, *ast.EnumValue, *ast.BooleanValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers, is a bad miss from me looks like a future bug (only StringValue is possible here atm I think RE aggregates, but would change when adding filter/limit/sort/etc)
- Fix areAstValuesEqual switch case
1ddc22d
to
a38e6c9
Compare
Benchmark ResultsSummary
✅ See Better Results...
❌ See Worse Results...
✨ See Unchanged Results...
🐋 See Full Results...
|
Benchmark ResultsSummary
✅ See Better Results...
❌ See Worse Results...
✨ See Unchanged Results...
🐋 See Full Results...
|
Codecov Report
@@ Coverage Diff @@
## develop #383 +/- ##
===========================================
+ Coverage 65.20% 65.49% +0.29%
===========================================
Files 80 81 +1
Lines 9251 9492 +241
===========================================
+ Hits 6032 6217 +185
- Misses 2599 2645 +46
- Partials 620 630 +10
|
Benchmark ResultsSummary
✅ See Better Results...
❌ See Worse Results...
✨ See Unchanged Results...
🐋 See Full Results...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't completed everything yet, but wanted to submit what I got through atm.
query/graphql/parser/query.go
Outdated
@@ -30,6 +30,7 @@ const ( | |||
DocKeyFieldName = "_key" | |||
CountFieldName = "_count" | |||
SumFieldName = "_sum" | |||
AverageFieldName = "_average" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_avg
is also an option here for the query name. Might want to see what others think from a DX. Doesn't truely matter and easy change, just wanted to bring attention to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raised in discord
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_average
is more clear. If we add aliases later we could support both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving as _average
only for now (Fred and Shahzad had minor preferences in opposite directions on discord)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now of the opinion that given our current set {_count
, _sum
, _min
, _max
} having short names, _avg
does fit better. It's also very common to use avg
so is not that much less clear than average
. So, +1 for _avg
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that tips the vote slightly in favour of _avg - will change to _avg
- _avg over _average
- squash the commit before merge
query/graphql/planner/average.go
Outdated
count := 0 | ||
if hasCount { | ||
typedCount, isInt := countProp.(int) | ||
count = typedCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extreme nitpick: The assingment should be after the isInt
if check :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol - yes it should be :)
- check isInt first
|
||
count := 0 | ||
if hasCount { | ||
typedCount, isInt := countProp.(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if its possible for count to be int64
. Would need to look at the count code but you'd be more knowledgeable then me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I remember looking and the tests covering this, but will review
- review typedCount cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed, count always returns int
@@ -69,6 +71,24 @@ func (n *countNode) Next() (bool, error) { | |||
// v.Len will panic if v is not one of these types, we don't want it to panic | |||
case reflect.Array, reflect.Chan, reflect.Map, reflect.Slice, reflect.String: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized, is reflect.Chan
needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not, but I think when I wrote that I just included all the types the len function supported else the code could panic if we ever change the consuming code to send them in. Think a few/most of the others are also defensive/extra in that sense.
query/graphql/planner/select.go
Outdated
@@ -298,6 +359,80 @@ func (n *selectNode) initFields(parsed *parser.Select) ([]aggregateNode, error) | |||
return aggregates, nil | |||
} | |||
|
|||
func (n *selectNode) tryGetAggregateField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is tryGetAggregateField
and areAstValuesEqual
on selectNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partly scoping (nothing outside of that type calls this), partly I think laziness RE context (n.p.ctx
)
query/graphql/planner/select.go
Outdated
// Average utilises count and sum in order to calculate it's return value, | ||
// so we have to add those nodes here if they do not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the higher level render step will remove/filter out the _sum
and _count
fields being created here if they werent originally present in the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct - each 'rendered' field is explicitly copied to a clean/new map, if a field is not render-requested it never gets copied. All the query integration tests assert this correctness and will fail if 'extra' fields are returned
Arguments: f.Statement.Arguments, | ||
} | ||
// We need to make sure the new aggregate index does not clash with any existing aggregate fields | ||
sumFieldIndex := (len(parsed.Fields) * fieldLenMultiplier) + sumFieldIndexOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same q as above w.r.t fieldLenMultipler
. I get why the offset is needed, just not the multipler. Is there somehow 3 field indexes for each parsed field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see answer on that comment
- Make sure you document this instance too
} | ||
|
||
if f.Statement.Name.Value == name { | ||
allArguementsMatch := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we're trying to do a complete match between target field and current field.
IE:
users {
_average(field: X. filter: Y, limit: Z)
_count(field: X, filter: Y)
}
Wouldn't match here since the arguments
would have an extra item found on _average
that isn't on _count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes correct, and the above example you gave should not match (as the average should be calculated according to the limited results - with the internal count being <= Z).
query/graphql/planner/select.go
Outdated
for i, innerValue := range thisTypedValue.Values { | ||
if !n.areAstValuesEqual(innerValue, otherTypedValue.Values[i]) { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if its actually possible for ListValues to contain different types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is not permitted, at least by the lib we use - in generate we reference this quite a bit gql.ListObject has a property list.OfType - which contains a single type that is the type of every item in the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case would it be sufficient to check the first element of each, instead of all elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more than just a type check - is a deep equals comparing the values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh yes, true true. Not sure how I missed that.
query/graphql/planner/select.go
Outdated
log.Error( | ||
n.p.ctx, | ||
"Could not evaluate arguement equality, unknown type.", | ||
logging.NewKV("Type", fmt.Sprintf("%T", thisValue.GetValue())), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if log error is really that necessary, but its OK.
However, if you move this and the other func off selectNode
as I mentioned in another comment, we wouldnt have access to n.p.ctx
context object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this log is one of the 2 main reasons it is scoped to the selectNode. Would not want to silently return false here (with no log) - this should never happen, but it will only cost the consumer performance-wise so a panic felt inappropriate (better it works a bit slowly than not at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could just add an error to the return type, and bubble up the error else where in the call stack that can more appropriately log it with access to a context.
So moving these related funcs off selectNode wouldnt actually be problematic.
It depends if we concretely see this as an "error" rather than just a possible "false" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could just add an error to the return type, and bubble up the error else where in the call stack that can more appropriately log it with access to a context.
Returning an error disrupts the flow and comes with the same problems as a panic, converting what is a minor performance hit into a disruptive and noticeable bug.
This log line really doesn't stop us from moving the function either, will just have to pass the context in some other way. If you are keen on dropping the function scope and making it package-global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try not to think of it as "package-global". Its often idiomatic go to assume a pure (im using pure to mean non-method) func and if need to be futher scope/group it onto a struct as a method. As far as packages go in Go, its perfectly fine to have w.e you need on the package scope.
I'm in the camp of keeping these kind of util funcs off of strucs (methods) if possible as I feel like it muddys up the goal of struct, and someone first reading it will assume it somehow relies on the functionality/state of the struct its a method on.
Can you explain what you mean by "disrupts the flow". If you mean how you call this func higher up, that can certainly be adjusted. Also not sure what you mean by "same problems as a panic". Returning the error means you are notifying the caller something went wrong (unexpected types in this case), and they should handle it appropriately. Dont see how returning an error here could cause bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what you mean by "disrupts the flow"
The function returns false here, treating anything that hits the default case as something that requires a new node (instead of sharing). The user will receive their results as if it returned true, but probably slightly slower. If you return an error or panic they will instead receive an error, and they and their stakeholders might needlessly be a little upset with us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make it global then. I think it is a bad idea though given that this function is only written to be called from a function on the selectNode and might do unexpected things if called from another context (and I'll have to pass context through), but it is not worth arguing over. I might do it in the new filter branch and merge this as-is tomorrow morning depending on how much this has changed over there.
- drop scope
d99e20d
to
1345eee
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
query/graphql/planner/select.go
Outdated
|
||
for _, possibleMatchingArguement := range f.Statement.Arguments { | ||
for _, targetArguement := range arguements { | ||
if possibleMatchingArguement.Name != targetArguement.Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to always be true - review and fix
- arguement name check issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Name is an object and I missed this, corrected with the following:
- if possibleMatchingArguement.Name != targetArguement.Name {
+ if possibleMatchingArguement.Name.Value != targetArguement.Name.Value {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs a bit more work (falls through the switch to default atm). Sorry for the multiple pushes, am being lazy and relying on code-cov
- fix type check
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
e4cfa0c
to
4a7f625
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
0a84b76
to
3414264
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
PR request
fdb9ff2
to
b453259
Compare
Benchmark ResultsSummary
✅ See Better Results...
❌ See Worse Results...
✨ See Unchanged Results...
🐋 See Full Results...
|
* Rename SumBaseArg to NumericAggregateBaseArg This can be shared between numeric aggregates * Refactor numerical inline array selector Can be shared between most aggregates * Add average aggregate support
Closes #95
Adds the Average aggregate. This aggregate is composed of the sum and count aggregates and will make use of existing count/sum nodes should they exist. Also refactors some more of the aggregate code and gql types, as Average and Sum share a fair amount of them. Adds very limited internal count filtering, to handle nil values when averaging - this will be formalized when adding proper support for aggregate filters.
Contains the commits in #374 as that is still waiting for review.