Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add product function #250

Merged
merged 1 commit into from
Jul 28, 2022
Merged

Conversation

vibhatha
Copy link
Contributor

This PR includes aggregate function product.

Comment on lines 403 to 407
- value: fp32
nullability: DECLARED_OUTPUT
decomposable: MANY
intermediate: fp64?
return: fp64?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with other PRs, why the promotion? For floats especially since all floats represent all real numbers (to some degree of accuracy), but likewise for integers.

See #251 (you're not the first to do it this way and probably won't be the last)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I also had the same doubt and I just followed the promotion. Waiting to see how the discussion progress on the promotion conclusion.

description: Product of a set of values.
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be solved in this PR immediately because it probably also applies to existing stuff, but something to think about for SATURATE in the context of these types of aggregates... What if the internal accumulator overflows but the net result would be within range? For example, sum:opt_i64(2^63-1, 2^63-1, -2^63-1, -2^63-1). Is that required to always yield 0? Or for ERROR, is it required to throw an error (after all, if the second and third element are swapped it wouldn't overflow)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a valid point. But I guess this is hard to determine. I would say it is better to leave room for overflows. But I could be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should wait with this until we get some more opinions in #251. I'm inclined to say the intermediate value should be the same type, if only because picking i64 feels like an arbitrary "eh, that's probably good enough" decision that doesn't actually solve the problem, but that just makes the overflow question all the more important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let’s do that.

@vibhatha
Copy link
Contributor Author

@jvanstraten I updated the PR with some modifications.

@jvanstraten jvanstraten changed the title feat: adding product function feat: add product function Jul 26, 2022
@julianhyde
Copy link

It would be clearer if you said 'arithmetic product'. Many people would assume cartesian product.

What value does this function return if the input is empty? The documentation should specify.

In the discussion for sum, the consensus seemed to be that sum should return 0 if it weren't for the SQL standard requiring that SUM returns null. In this case there is no such precedent. I would vote for 1.

@jvanstraten
Copy link
Contributor

What value does this function return if the input is empty? The documentation should specify.

It does, we mimicked sum and had it return null.

IMO we should either embrace the SQL way and apply its null nonsense consistently across all similar arithmetic aggregations (and also define a product1 alongside sum0) or not at all.

@vibhatha, you'll need to rewrite your commit messages (or just rebase them all into one) to make CI pass.

@vibhatha
Copy link
Contributor Author

@jvanstraten I will rebase with an accurate commit msg.

@julianhyde
Copy link

SQL seems to be getting smarter. For example ARRAG_AGG on an empty input returns an empty array, not NULL. Similarly LISTAGG. See Snowflake.

For statistical functions such as AVG and STDEV there are valid reasons to return NULL on empty input. So I think that SUM is the only one it got wrong.

@vibhatha vibhatha force-pushed the aggregate-product branch from 99b19b8 to 6e91b75 Compare July 28, 2022 03:52
@vibhatha
Copy link
Contributor Author

@jvanstraten I squashed to a single commit, let's see the CIs.

@vibhatha vibhatha requested a review from jvanstraten July 28, 2022 03:54
@vibhatha vibhatha force-pushed the aggregate-product branch from 6e91b75 to 23bb131 Compare July 28, 2022 13:40
@vibhatha
Copy link
Contributor Author

@jvanstraten I updated the PR with nullability and return type updates.

@vibhatha vibhatha requested a review from jvanstraten July 28, 2022 13:41
@vibhatha vibhatha requested a review from jvanstraten July 28, 2022 15:24
@vibhatha vibhatha force-pushed the aggregate-product branch from a937851 to 648679a Compare July 28, 2022 16:01
@jacques-n jacques-n merged commit 34ec8f5 into substrait-io:main Jul 28, 2022
vibhatha added a commit to vibhatha/substrait that referenced this pull request Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants