exec: overflow handling for aggregates #38775

jordanlewis · 2019-07-09T19:03:44Z

Aggregates in vectorized currently don't check for overflows. This is incorrect and needs to be fixed.

rafiss · 2019-07-18T15:44:09Z

I'm wondering what we want to do with float overflow. Our current non-vectorized implementation already differs from Postgres.

Postgres:

rafiss@127:postgres> create table foo (b double precision primary key, c int);
CREATE TABLE

rafiss@127:postgres> insert into foo values(1.79769313486231570814527423737043567981e+308);
INSERT 0 1

rafiss@127:postgres> insert into foo values(1.79769313486231570814527423737043567981e+307);
INSERT 0 1

rafiss@127:postgres> select * from foo;
+-----------------------+--------+
| b                     | c      |
|-----------------------+--------|
| inf                   | <null> |
| 1.79769313486232e+307 | <null> |
+-----------------------+--------+
SELECT 2

rafiss@127:postgres> select avg(b) from foo;
2019-07-18 11:38:26.009 EDT [85826] ERROR:  value out of range: overflow
2019-07-18 11:38:26.009 EDT [85826] STATEMENT:  select avg(b) from foo
value out of range: overflow

CRDB 19.1

[email protected]:64470/defaultdb> create table foo (b float primary key);
CREATE TABLE

[email protected]:64561/defaultdb> insert into foo values(1.79769313486231570814527423737043567981e+308);
INSERT 1

[email protected]:64561/defaultdb> insert into foo values(1.79769313486231570814527423737043567981e+307);
INSERT 1

[email protected]:64561/defaultdb> select * from foo;
             b
+-------------------------+
  1.7976931348623158e+307
  1.7976931348623157e+308
(2 rows)

[email protected]:64561/defaultdb> select avg(b) from foo;
  avg
+------+
  +Inf
(1 row)

jordanlewis · 2019-07-18T15:45:15Z

Are we promoting float to decimal in that latter case?

rafiss · 2019-07-18T15:47:57Z

No, we are not promoting in that case. (We also see the same behavior as that latter case in the vectorized engine as it is right now.)

jordanlewis · 2019-07-18T15:52:20Z

I think this difference is probably acceptable at least for now - we already do it, plus it's pretty well-defined - it doesn't wrap around. YIL this is called "saturating" overflow behavior.

maddyblue · 2019-07-19T18:53:27Z

Yes, I discovered our float difference while writing edge. I decided it's fine because the behavior is the same as you would get if you were adding two large floats together (infinity). Actually thinking this through more...there's a good argument that while sum can return infinity, avg shouldn't. Hmm. Well at least we have tests asserting the results now.

The overflow checks are done as part of the code generation in overloads.go. The checks are done inline, rather than calling the functions in the arith package for performance reasons. The checks are only done for integer math. float math is already well-defined since overflow will result in +Inf and -Inf as necessary. The operations that these checks are relevant for are the SUM_INT aggregator and projection. In the future, AVG will also benefit from these overflow checks. This changes the error message produced by overflows in the non-vectorized SUM_INT aggregator so that the messages are consistent. This should be fine in terms of postgres-compatibility since SUM_INT is unique to CRDB and eventually we will get rid of it anyway. resolves cockroachdb#38775 Release note: None

38967: exec: overflow handling for vectorized arithmetic r=rafiss a=rafiss The overflow checks are done as part of the code generation in overloads.go. The checks are done inline, rather than calling the functions in the arith package for performance reasons. The checks are only done for integer math. float math is already well-defined since overflow will result in +Inf and -Inf as necessary. The operations that these checks are relevant for are the SUM_INT aggregator and projection. In the future, AVG will also benefit from these overflow checks. This changes the error message produced by overflows in the non-vectorized SUM_INT aggregator so that the messages are consistent. This should be fine in terms of postgres-compatibility since SUM_INT is unique to CRDB and eventually we will get rid of it anyway. resolves #38775 Release note: None Co-authored-by: Rafi Shamim <[email protected]>

The overflow checks are done as part of the code generation in overloads.go. The checks are done inline, rather than calling the functions in the arith package for performance reasons. The checks are only done for integer math. float math is already well-defined since overflow will result in +Inf and -Inf as necessary. The operations that these checks are relevant for are the SUM_INT aggregator and projection. In the future, AVG will also benefit from these overflow checks. This changes the error message produced by overflows in the non-vectorized SUM_INT aggregator so that the messages are consistent. This should be fine in terms of postgres-compatibility since SUM_INT is unique to CRDB and eventually we will get rid of it anyway. resolves cockroachdb#38775 Release note: None

jordanlewis assigned rafiss Jul 9, 2019

rafiss added A-sql-vec SQL vectorized engine C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jul 11, 2019

rafiss mentioned this issue Jul 18, 2019

exec: overflow handling for vectorized arithmetic #38967

Merged

yuzefovich mentioned this issue Jul 19, 2019

exec: tracking issue for known logic tests failures for vectorize "experimental_on" #38994

Closed

11 tasks

yuzefovich mentioned this issue Jul 22, 2019

exec: tracking issue for turning on "auto" by default #38920

Closed

14 tasks

craig bot closed this as completed in #38967 Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exec: overflow handling for aggregates #38775

exec: overflow handling for aggregates #38775

jordanlewis commented Jul 9, 2019 •

edited

Loading

rafiss commented Jul 18, 2019

jordanlewis commented Jul 18, 2019

rafiss commented Jul 18, 2019

jordanlewis commented Jul 18, 2019

maddyblue commented Jul 19, 2019

exec: overflow handling for aggregates #38775

exec: overflow handling for aggregates #38775

Comments

jordanlewis commented Jul 9, 2019 • edited Loading

rafiss commented Jul 18, 2019

jordanlewis commented Jul 18, 2019

rafiss commented Jul 18, 2019

jordanlewis commented Jul 18, 2019

maddyblue commented Jul 19, 2019

jordanlewis commented Jul 9, 2019 •

edited

Loading