Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): cast/cmp with session timezone #7243

Merged
merged 41 commits into from
Jan 12, 2023

Conversation

jon-chuang
Copy link
Contributor

@jon-chuang jon-chuang commented Jan 6, 2023

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

Support:

  • Cast
    • date to timestamptz
    • timestamp to timestamptz
    • timestamptz to date
    • timestamptz to timestamp
    • timestamptz to time
  • Comparison
    • date with timestamptz
    • timestamp with timestamptz
  • Cast
    • timestamptz to string
      • We have implemented this to always use UTC because it is the basis of many things.
      • Now we use the session timezone
    • string-without-zone to timestamptz (e.g. '2022-10-01 12:00:00'::timestamptz)

Missing:

  • Arithmetic
    • interval-var + timestamptz = timestamptz (e.g. interval '1' day + '2022-03-13 09:00:00Z'::timestamptz)
    • timestamptz + interval-var = timestamptz
    • timestamptz - interval-var = timestamptz

When doing so, we will emit the PgResponse NOTICE:

dev=> select '2022-01-01 00:00:00'::timestamp::timestamp with time zone;
NOTICE:  Your session timezone is UTC. It was used in the interpretation of timestamps and dates in your query. If this is unintended,
                change your timezone to match that of your data's with `set timezone = [timezone]` or 
                rewrite your query with an explicit timezone conversion, e.g. with `AT TIME ZONE`.

         timestamp         
---------------------------
 2022-01-01 00:00:00+00:00
(1 row)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

Refer to a related PR or issue link (optional)

#5826

@jon-chuang jon-chuang requested a review from xiangjinwu January 6, 2023 09:25
@jon-chuang
Copy link
Contributor Author

Not sure if this approach is the best.

Preferably, the timezone can become a context that the expression framework can reference rather than embedding in an AT TIME ZONE expression.

For instance, it's hard to support string to timestamptz casting fully, and also implicit casting does not seem to be caught by the binder.

@jon-chuang jon-chuang changed the title feat(frontend): cast/cmp with session timezone [draft] feat(frontend): cast/cmp with session timezone Jan 6, 2023
@xiangjinwu
Copy link
Contributor

All casts (explicit, assign, implicit) can be handled at a single place: FunctionCall::new_cast. However, it does not have access to self: Binder and has a lot of indirect callers if we want to add a new parameter to it.

As mentioned above, including timezone as context in the expression framework has the advantage of supporting more expressions. It also takes a large effort to refactor, and cannot provide the notice at binding phase. To get a notice when time zone is used implicitly, we still need to (1) update all indirect callers of FunctionCall::new_cast as above; or (2) add an extra frontend phase that visit all exprs. The latter one is easier to extend and maintain, but is more expensive.

Just sharing some thoughts. I do not have a preference over any one.

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Jan 10, 2023

Seems the session information including session timezone will be lost during recovery test, as it seems the session is restarted from scratch. Any ideas @yezizp2012?

Perhaps we should remove the .slt file requiring session config from recovery test? (we may need another way to persist the default session config for the database/user)

@yezizp2012
Copy link
Member

yezizp2012 commented Jan 10, 2023

Seems the session information including session timezone will be lost during recovery test, as it seems the session is restarted from scratch. Any ideas @yezizp2012?

Yes, the frontend will be killed randomly during the recovery test, which will lead the session restart and all config will be reset.

Perhaps we should remove the .slt file requiring session config from recovery test? (we may need another way to persist the default session config for the database/user)

Yes, .slt requiring session config can't be well test by recovery test, we'd better ignore it for now. You can ignore the new slt in this PR here:

if kill && (path.ends_with("tpch_snapshot.slt") || path.ends_with("tpch_upstream.slt")) {

@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Merging #7243 (334bfd8) into main (47e7b00) will decrease coverage by 0.03%.
The diff coverage is 51.73%.

@@            Coverage Diff             @@
##             main    #7243      +/-   ##
==========================================
- Coverage   73.05%   73.01%   -0.04%     
==========================================
  Files        1068     1069       +1     
  Lines      170827   171099     +272     
==========================================
+ Hits       124799   124935     +136     
- Misses      46028    46164     +136     
Flag Coverage Δ
rust 73.01% <51.73%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/expr/src/expr/build_expr_from_prost.rs 41.80% <0.00%> (-0.94%) ⬇️
src/expr/src/expr/expr_binary_nonnull.rs 52.61% <0.00%> (-3.45%) ⬇️
src/frontend/src/expr/mod.rs 80.08% <ø> (ø)
src/frontend/src/handler/query.rs 18.78% <0.00%> (-0.43%) ⬇️
...frontend/src/optimizer/plan_node/batch_hash_agg.rs 66.07% <0.00%> (ø)
...ntend/src/optimizer/plan_node/batch_lookup_join.rs 54.82% <0.00%> (-2.94%) ⬇️
.../src/optimizer/plan_node/batch_nested_loop_join.rs 78.72% <0.00%> (-4.43%) ⬇️
...ontend/src/optimizer/plan_node/batch_simple_agg.rs 78.57% <0.00%> (ø)
...frontend/src/optimizer/plan_node/batch_sort_agg.rs 72.09% <0.00%> (-4.46%) ⬇️
...c/frontend/src/optimizer/plan_node/batch_update.rs 60.41% <0.00%> (-7.03%) ⬇️
... and 36 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clever solution!

@@ -62,6 +65,39 @@ pub fn timestamp_at_time_zone(input: NaiveDateTimeWrapper, time_zone: &str) -> R
Ok(usec)
}

pub fn timestamptz_to_string(elem: i64, time_zone: &str, writer: &mut dyn Write) -> Result<()> {
let time_zone = lookup_time_zone(time_zone)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the time_zone string will be looked up for many times. Is there any way to optimize it? Because in most cases this will just a literal. (We can even reject the non-literal cases, I think)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup is a hash table lookup, with hashing cost linear to string length. The bigger cost is actually our way of evaluating literal expr, which would create a column full of duplicated values.
The ideal case, of course, is to pass a single session timezone as task context from frontend to compute node, and do a single lookup. But I guess we can defer that refactor until there is a more useful case for expr eval context.

Copy link
Contributor Author

@jon-chuang jon-chuang Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem reasonable if the timezone string is replaced by an i32 fixed offset that is looked up by front-end... cloning i32 array is probably much cheaper than string array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is not a fixed offset. The offset can vary at different dates.

Copy link
Contributor Author

@jon-chuang jon-chuang Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, indeed. We could represent it by the chrono_tz enum value, and repr in our type system as i32?

https://docs.rs/chrono-tz/latest/chrono_tz/static.TZ_VARIANTS.html#

It may be a bit brittle. But an i32 at the RHS of AT TIME ZONE and CAST_WITH_TIMEZONE will always be interpreted as a chrono_tz timezone.

I think the binder should not look it up. We can peform the lookup at the same place as looking up session_timezone. That way, it will still be human readable until it is turned into a proto plan.

Copy link
Contributor Author

@jon-chuang jon-chuang Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we don't support strings such as UTC+8 via chrono_tz. Do you think we should support these fixed offsets as well? We can reserve some space in our i32 encoding for such strings, and look them up separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created issue: #7335

let input_type = input.return_type();
match (input_type.clone(), return_type.clone()) {
(DataType::Timestamptz, DataType::Varchar)
| (DataType::Varchar, DataType::Timestamptz) => {
Copy link
Contributor

@xiangjinwu xiangjinwu Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it clear (rather than to suggest any actions), this may be overly noisy even when all input strings already contains timezone (2022-01-01 12:00:00-08:00) and the session timezone is actually unused. It is impossible to know this in frontend but only during execution.

As a special case, when the input is a literal string, we can do the parsing in frontend and avoid casting expr at all. This would improve the user experience. Opened #7320 to track it.

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great!

Comment on lines +1 to +14
statement ok
set timezone = "us/pacific";

# Cast date to timestamptz
query T
select '2022-01-01'::date::timestamp with time zone;
----
2022-01-01 08:00:00+00:00

# Cast timestamp to timestamptz
query T
select '2022-01-01 00:00:00'::timestamp::timestamp with time zone;
----
2022-01-01 08:00:00+00:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should display 2022-01-01 00:00:00-08:00 or 2022-01-01 08:00:00+00:00? Although they are equal.

Copy link
Contributor Author

@jon-chuang jon-chuang Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be in an upcoming PR (depending on this PR). It will format Pg rows with the session timestamp.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 2644 files.

Valid Invalid Ignored Fixed
1258 1 1385 0
Click to see the invalid file list
  • src/frontend/src/expr/session_timezone.rs

Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

@mergify mergify bot merged commit eca9541 into main Jan 12, 2023
@mergify mergify bot deleted the jon/implicit_cast_with_session_timezone branch January 12, 2023 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants