-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-15] Fixing datetime conversion and SQL literal #8464
[SIP-15] Fixing datetime conversion and SQL literal #8464
Conversation
return "'{}'".format(dttm.strftime("%Y-%m-%d")) | ||
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S")) | ||
return f"CAST('{dttm.isoformat()[:10]}' AS DATE)" | ||
if tt == "TIMESTAMP": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida do you know why the TIMESTAMP
type wasn't defined for BigQuery?
return f"'{dttm.strftime('%Y-%m-%d %H:%M:%S')}'" | ||
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]: | ||
if target_type.upper() in ("DATE", "DATETIME"): | ||
return f"CAST('{dttm.isoformat()}' AS TIMESTAMP)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar I know you recently added this and I was hoping you could validate the logic.
b65aa7a
to
3122919
Compare
if tt == "TIMESTAMP": | ||
return "from_iso8601_timestamp('{}')".format(dttm.isoformat()) | ||
return "CAST ('{}' AS TIMESTAMP)".format(dttm.strftime("%Y-%m-%d %H:%M:%S")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be casting other types to TIMESTAMP
. The reason for this function is to ensure that the LHS and RHS for the time filter comparison are equivalent types.
""" | ||
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now handled in dttm_sql_literal
as we need to determine if the python-date-format logic needs to be invoked.
@@ -50,8 +50,10 @@ def epoch_to_dttm(cls): | |||
return "dateadd(S, {col}, '1970-01-01')" | |||
|
|||
@classmethod | |||
def convert_dttm(cls, target_type: str, dttm: datetime) -> str: | |||
return "CONVERT(DATETIME, '{}', 126)".format(dttm.isoformat()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be casting all types to a DATETIME
. The reason for this function is to ensure that the LHS and RHS for the time filter comparison are equivalent types.
) | ||
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]: | ||
tt = target_type.upper() | ||
if tt == "DATE": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note there was no previous logic for handling the DATE
type.
827ab4c
to
4f1543a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very quick first pass, will look in closer detail later.
tests/db_engine_specs/mssql_tests.py
Outdated
for target_type in ("DATE", "DATETIME", "SMALLDATETIME", "TIMESTAMP"): | ||
self.assertEqual( | ||
MssqlEngineSpec.convert_dttm(target_type, dttm), | ||
"CONVERT(DATETIME, '2019-01-02T03:04:05.678900', 126)", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..and the test here.
3ea4968
to
fa2d521
Compare
Codecov Report
@@ Coverage Diff @@
## master #8464 +/- ##
==========================================
- Coverage 66.57% 66.52% -0.05%
==========================================
Files 449 449
Lines 22567 22595 +28
Branches 2367 2367
==========================================
+ Hits 15023 15032 +9
- Misses 7406 7425 +19
Partials 138 138
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second pass
def convert_dttm(cls, target_type: str, dttm: datetime) -> str: | ||
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]: | ||
tt = target_type.upper() | ||
if tt == "DATE": | ||
return "CAST('{}' AS DATE)".format(dttm.isoformat()[:10]) | ||
return f"TO_DATE('{dttm.date().isoformat()}', 'yyyy-MM-dd')" | ||
elif tt == "TIMESTAMP": | ||
return "CAST('{}' AS TIMESTAMP)".format(dttm.strftime("%Y-%m-%d %H:%M:%S")) | ||
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S")) | ||
return f"""TO_TIMESTAMP('{dttm.isoformat(sep=" ", timespec="seconds")}', 'yyyy-MM-dd HH:mm:ss')""" # pylint: disable=line-too-long | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgivre can you check if these are valid and if there's anything missing? A quick googling didn't turn up any DATETIME
type for Drill, just DATE
, TIME
and TIMESTAMP
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
superset/db_engine_specs/hive.py
Outdated
elif tt == "TIMESTAMP": | ||
return "CAST('{}' AS TIMESTAMP)".format(dttm.strftime("%Y-%m-%d %H:%M:%S")) | ||
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S")) | ||
return f"""CAST('{dttm.isoformat(sep=" ", timespec="seconds")}' AS TIMESTAMP)""" # pylint: disable=line-too-long | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems Hive supports nanosecond precision:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-TimestampstimestampTimestamps
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]: | ||
if target_type.upper() == "TEXT": | ||
return f"""'{dttm.isoformat(sep=" ", timespec="microseconds")}'""" | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLite also supports time stored in REAL and INTEGER columns according to the docs:
TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
Should we support the REAL and INTEGER conversions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willbarrett epoch is handled by specifying epoch_s
in the python-date-format.
Also a REAL
or INTEGER
column could encode temporal information in a different form, i.e., one could be using REAL
to store a UNIX timestamp with floating point precision though and thus we can’t blindly handle these types.
Note per the referenced PoC PR the future end goal is to provide an engine specific graph which maps between various SQLAlchemy and datetime
types. Much of the time grain bucketing is potentially wrong for string columns.
fa2d521
to
52ae511
Compare
@villebro @willbarrett thanks for the feedback. I've hopefully addressed all your comments (either via code changes or replies to your comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and big thanks for the big effort with the unit tests. I expect something will break as a consequence of this big change, but these are easy to fix later on, so I don't see any reason not to merge this.
Thanks for the review @villebro. I agree there could be a regression, but i) there were clearly some ill-defined mappings for certain dialects, and ii) with the addition of unit tests after any fixes future regressions should be preventable. |
Absolutely @john-bodley ; even in the light of possible regressions I expect this to fix more bugs than it introduces. I propose merging this asap. |
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]: | ||
if target_type.upper() == "DATETIME": | ||
return f"""CAST('{dttm.isoformat(timespec="seconds")}' AS DATETIME)""" | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley thanks for pointing out to this, the logic looks good and I've retested it and found no problems
CATEGORY
Choose one
SUMMARY
This PR updates a number of aspects related to the datetime SQL literal logic which is used for the RHS of the time filter comparisons. Specifically:
DbEngineSpec.convert_dttm
optionally returnNone
if a type conversion does not exist.dttm_sql_literal
to first try to convert adatetime
object to a SQL expression (1) prior to converting either a string on numerical type based on the python-date-format. The reason for this change is we should handle known types before falling back to using the python-date-format, as in theory these could be defined incorrectly for non-string/float types.convert_dttm
logic for a number of database engines. There were a handful of examples where this was simply returning either i) a string representation of thedatetime
object rather than the necessary SQL expression for casting from adatetime
object to the native temporal type, or ii) incorrectdatetime
formatting or casting logic. Note I'm not familiar with a number of these database engines and thus the updated conversions were the result of Googling. This logic should probably be validated.convert_dttm
function.str.format(...)
.dttm.isoformat()[:10]
withdttm.date().isoformat()
in improve readability.Regarding timestamp precision some databases only seem to support seconds whereas others support microseconds. Where possible the
datetime
object is formatted to the highest level precision available.Note for context I sense this PoC is probably the end state for correctly handling the necessary casting of types (#7682).
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TEST PLAN
CI and additional unit tests.
ADDITIONAL INFORMATION
REVIEWERS
to: @betodealmeida @dpgaspar @etr2460 @michellethomas @mistercrunch @villebro