-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sem/tree: use StmtTimestamp as max for AsOf, enforce non-zero intervals #34547
sem/tree: use StmtTimestamp as max for AsOf, enforce non-zero intervals #34547
Conversation
I messed something up here, will dig more. |
29679ae
to
d2dafa5
Compare
This is ready to go PTAL when you get a chance. |
@@ -57,3 +52,20 @@ query T | |||
SELECT * FROM (SELECT now()) AS OF SYSTEM TIME '2018-01-01' | |||
---- | |||
2018-01-01 00:00:00 +0000 UTC | |||
|
|||
# Verify that zero intervals those indistinguishable from zero cause an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing a word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or rather had one too many
@@ -97,7 +97,11 @@ func EvalAsOfTimestamp( | |||
} | |||
// Attempt to parse as an interval. | |||
if iv, err := ParseDInterval(s); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseDInterval is for parsing strings into postgres-compatible SQL datums, which happen to not support nanoseconds. We should consider changing this function to one that does support nanos, since that's a valid argument to AOST. We could just use go's time.ParseDuration for that, which would lose us support for days and months, but that probably doesn't matter since most GC intervals are less than that. Then again, why would anyone use AOST with a very specific nanosecond argument. Like, I can't think of a situation where that would actually matter. Ok so maybe this is fine? Just some thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 20 of 20 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @mjibson)
pkg/sql/sem/tree/as_of.go, line 99 at r1 (raw file):
Previously, mjibson (Matt Jibson) wrote…
ParseDInterval is for parsing strings into postgres-compatible SQL datums, which happen to not support nanoseconds. We should consider changing this function to one that does support nanos, since that's a valid argument to AOST. We could just use go's time.ParseDuration for that, which would lose us support for days and months, but that probably doesn't matter since most GC intervals are less than that. Then again, why would anyone use AOST with a very specific nanosecond argument. Like, I can't think of a situation where that would actually matter. Ok so maybe this is fine? Just some thoughts.
I too was concerned by this. However maybe it's OK if SQL intervals don't support nanoseconds. Also it would be troublesome to have a feature that has one precision in one context, and another in a different context. It would make things difficult to explain. However this is fine only as long as we offer another mechanism for AOST to work with nanoseconds instead.
Hence my question: do we properly support arithmetic using decimals in this context, and are we always converting a decimal value to nanoseconds (like we used to)?
If that's the case:
-
I am not too bothered by intervals not supporting nanoseconds
-
however, I think that we need to test the nanosecond behavior. Please convert the existing nanosecond tests to use decimals instead and verify that sub-microseconds intervals are properly supported.
@knz I’m not sure I understand. The decimal literal is used here to specify a specific hlc time stamp rather than an interval or relative offset. I do believe it supports nanosecond resolution and will add testing to that effect. I do not however believe that there exists a nanosecond resolution mechanism to deal with relative offsets from now. Given that, I’m not totally sure how to update the old ns tests. Am I missing something? |
What about an expression with type DECIMAL. Convert now() to DECIMAL to get the current timestamp as nanosecond, perform arithmetic on that to compute a nanosecond offset, etc. Or handle negative decimals as relative to the current timestamp automatically. I think we can do ns-precision relative offsetting this way, isn't that the case? |
This is disallowed by the constant expression requirement.
This seems doable. I wonder if it's confusing for users as we explicitly say in the docs that the decimal format is used for HLC. |
Just to clarify, if we move forward with the vision of using negative decimal values as offsets we still need to nail down the semantics. For the decimal HLC values the integer portion represents time in unix nanos and the decimal portion represents the logical clock. Would you be in a favor of a solution that had similar semantics with regards to the wall clock portion of the decimal and ignored the logical portion? e.g. |
My idea with decimal expressions is that the value would get evaluated as usual, and then whatever decimal portion remains becomes the logical part (without truncation). If the user wants truncation they can do that manually. |
Sounds good. I'll type it up. |
d2dafa5
to
0b12d5b
Compare
Updated, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, @knz, and @mjibson)
pkg/sql/sem/tree/as_of.go, line 122 at r2 (raw file):
// preserving the specified logical value. if convErr == nil && ts.WallTime < 0 { ts.WallTime = stmtTimestamp.Add(time.Duration(ts.WallTime)).UnixNano()
The semantics of this are odd. It subtracts the walltimes but leaves the logical as is. Also now 0
means two different things (unix epoch or now) depending on which way you go from it. This makes me minorly uncomfortable even though it is a nice solution to one problem. @bdarnell do you have thoughts on this choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @knz, and @mjibson)
pkg/sql/sem/tree/as_of.go, line 122 at r2 (raw file):
Also now 0 means two different things (unix epoch or now) depending on which way you go from it.
Zero still means unix epoch.
The semantics of this are odd. It subtracts the walltimes but leaves the logical as is.
I agree, I too was at a loss for how to deal with the logical portion of the decimal when Raphael suggested interpreting negative decimals as offsets. I just went with his suggestion here.
My idea with decimal expressions is that the value would get evaluated as usual, and then whatever decimal portion remains becomes the logical part (without truncation). If the user wants truncation they can do that manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, @knz, and @mjibson)
pkg/sql/sem/tree/as_of.go, line 122 at r2 (raw file):
Previously, ajwerner wrote…
Also now 0 means two different things (unix epoch or now) depending on which way you go from it.
Zero still means unix epoch.
The semantics of this are odd. It subtracts the walltimes but leaves the logical as is.
I agree, I too was at a loss for how to deal with the logical portion of the decimal when Raphael suggested interpreting negative decimals as offsets. I just went with his suggestion here.
My idea with decimal expressions is that the value would get evaluated as usual, and then whatever decimal portion remains becomes the logical part (without truncation). If the user wants truncation they can do that manually.
I mean the idea of zero, not it's actual value. 0 + X
= unix epoch + X. 0 - X
= now - X.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @knz, and @mjibson)
pkg/sql/sem/tree/as_of.go, line 122 at r2 (raw file):
Previously, mjibson (Matt Jibson) wrote…
I mean the idea of zero, not it's actual value.
0 + X
= unix epoch + X.0 - X
= now - X.
The only counterpoint I'd make to that is that positive values of X in roughly [1, 1388448000000000000] aren't useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @knz, and @mjibson)
pkg/sql/sem/tree/as_of.go, line 122 at r2 (raw file):
Previously, ajwerner wrote…
The only counterpoint I'd make to that is that positive values of X in roughly [1, 1388448000000000000] aren't useful
If we explicitly disallow using this syntax to create a timestamp before say, 2000, does that make the discontinuity more palatable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, @knz, and @mjibson)
pkg/sql/sem/tree/as_of.go, line 122 at r2 (raw file):
Previously, ajwerner wrote…
If we explicitly disallow using this syntax to create a timestamp before say, 2000, does that make the discontinuity more palatable?
I'm not concerned about the discontinuity. Just that 0 as a reference point is now two different things depending on context. I'm not sure it's bad, just pointing it out. Will it confuse users? Will it cause accidental, unknown, or incorrect usage? When we introduced negative intervals that was a new feature since the interval type previously did nothing. This one changes the existing meaning of int and decimal types, which is an explicit change in the current behavior (although clearly a totally silly one since who wants data before the unix epoch). Because it's a change instead of a new behavior I feel ok at least having a ponder about if it's a good design decision.
This discussion is swerving outside of the scope of a code review into a design question. The review context is insufficient to resolve. I recommend to write up in prose what are the few problems we're trying to solve here and enumerate the alternatives, with pros and cons for each. This will provide clarity and perhaps reveal a new way we haven't considered yet.
Just my 2cents.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Okay, here's a recap. AOST currently permits strings like @knz suggested that there should be a way to express AOST relative to the statement time at nanosecond precision (something not currently supported). My understanding of the source of this suggestion is that testing code today currently uses an AOST '-1ns' to indicate that now should be used as the time. Prior to the most recent iteration I had changed the testing call sites to instead pass I do not feel that having support for nanosecond precision relative offsets is important but I do not feel strongly about that. |
You brought up a point I failed to mention. AOST expressions have to be constant and thus both now and the arithmetic is disallowed. |
Oh hm. that's a bummer. |
(I may need a little more time to think this over. Sorry that it's late in my TZ so my brain doesn't function any more for today.) |
Ok so a possible alternative would be to split case 3 in my last comment:
|
(note that the value 0 itself would be disallowed in both cases, to avoid the discontinuity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another wrinkle: AOST values between now()
and now() - max_offset
are kind of unsafe because they are within the uncertainty window but disable the checks for ReadWithinUncertaintyIntervalError. I think we may actually want to ban AOST values in this range, and since max_offset
will generally be measured in milliseconds, there's no need for nanosecond precision in negative intervals.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, @knz, and @mjibson)
I recall a convo either on the forum or gitter where that was exactly the feature looked for by a user: disable the uncertainty error at the risk of non-serializable reads, where the goal was to get a guaranteed-bound read latency on recent data with a weaker consistency expectation. I wouldn't be keen to disallow that time interval outright before we double-check that it's not useful (and/or find out other ways to achieve the same). |
That's a concern that we certainly had not considered. I'd prefer to push that into a separate issue is that's okay. For now I think I'll revert this change to where it stood this morning where the tests were changed from |
I think this conversation was long-winded and it's the end of the day. We should at least let one night worth of sleep precipitate the various thoughts in motion, before conclusions are drawn. |
617d447
to
73a20f3
Compare
LGTM thanks |
TFTR |
bors r+ |
👎 Rejected by PR status |
73a20f3
to
dddffb9
Compare
bors r+ |
Build failed |
Before this PR there was some unexpected behavior when using AOST clauses with literals which were very small. There are two culprits. The first is that the logic to evaluate the AOST took a maximum value that was different and always greater than the statement time which is the timestamp from which intervals literals are used to compute a timestamp. That means that if the interval literal were positive but smaller than the difference between the "max" and the statement time, they would be judged to be valid. There's no reason we need to compute an additional maximum, instead we just use statement time as the maximum. Another issue is that values smaller than 1 microsecond are rounded to zero so `100ns`, `0,0` and `-100ns` are equivalent as far as the evaluation code was concerned. This diff makes all of these intervals which are rounded to zero illegal. Note that in a number of testing scenarios, particularly around backup/restore and automatic stats collection we used to use a zero as of system time value which this PR changes to 1 microsecond which ought to be functionally equivalent. Fixes cockroachdb#34371. Release note (sql change): Disallow `AS OF SYSTEM TIME` interval expressions less than 1 microsecond in the past.
dddffb9
to
8d9c588
Compare
bors r+ |
Build failed |
bors r+ |
34547: sem/tree: use StmtTimestamp as max for AsOf, enforce non-zero intervals r=ajwerner a=ajwerner Before this PR there was some unexpected behavior when using AOST clauses with literals which were very small. There are two culprits. The first is that the logic to evaluate the AOST took a maximum value that was different and always greater than the statement time which is the timestamp from which intervals literals are used to compute a timestamp. That means that if the interval literal were positive but smaller than the difference between the "max" and the statement time, they would be judged to be valid. There's no reason we need to compute an additional maximum, instead we just use statement time as the maximum. Another issue is that values smaller than 1 microsecond are rounded to zero so `100ns`, `0,0` and `-100ns` are equivalent as far as the evaluation code was concerned. This diff makes all of these intervals which are rounded to zero illegal. Fixes #34371. Release note (sql change): Disallow `AS OF SYSTEM TIME` interval expressions less than 1 microsecond in the past. Co-authored-by: Andrew Werner <[email protected]>
Build succeeded |
Before this PR there was some unexpected behavior when using AOST clauses with
literals which were very small. There are two culprits. The first is that the
logic to evaluate the AOST took a maximum value that was different and always
greater than the statement time which is the timestamp from which intervals
literals are used to compute a timestamp. That means that if the interval
literal were positive but smaller than the difference between the "max" and
the statement time, they would be judged to be valid. There's no reason we need
to compute an additional maximum, instead we just use statement time as the
maximum. Another issue is that values smaller than 1 microsecond are rounded to
zero so
100ns
,0,0
and-100ns
are equivalent as far as the evaluationcode was concerned. This diff makes all of these intervals which are rounded to
zero illegal.
Fixes #34371.
Release note (sql change): Disallow
AS OF SYSTEM TIME
interval expressionsless than 1 microsecond in the past.