Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal error in CAST from Timestamp[us] #3922

Closed
ghost opened this issue Oct 21, 2022 · 9 comments
Closed

Internal error in CAST from Timestamp[us] #3922

ghost opened this issue Oct 21, 2022 · 9 comments
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Oct 21, 2022

Describe the bug
The following error occurs in cast of timestamp.

Exception: Internal error: Unsupported CAST from Timestamp(Microsecond, None) to Int32. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

To Reproduce

import datafusion
ctx = datafusion.SessionContext()
ctx.register_parquet('nyc_taxi', 'nyc_taxi.parquet')
sql = '''
WITH a AS
   (SELECT
      tpep_pickup_datetime,
      tpep_dropoff_datetime,
      CAST(tpep_pickup_datetime AS INTEGER) AS pickup,
      CAST(tpep_dropoff_datetime AS INTEGER) AS dropoff
   FROM nyc_taxi)
SELECT
  CAST(AVG(dropoff - pickup) AS NUMERIC(5, 2))
FROM a
'''
df = ctx.sql(sql)
df.show()

Exception Traceback (most recent call last)
Input In [5], in
4 sql = '''
5 WITH a AS
6 (SELECT
(...)
14 FROM a
15 '''
16 df = ctx.sql(sql)
---> 17 df.show()

Exception: Internal error: Unsupported CAST from Timestamp(Microsecond, None) to Int32. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Additional context
Latest version : datafusion 0.6.0 (python)

@ghost ghost added the bug Something isn't working label Oct 21, 2022
@waitingkuo
Copy link
Contributor

waitingkuo commented Oct 21, 2022

there's no implementaton for casting timestamp to int32 in the current version. you can cast it to int64 for now (bigint)

@alamb
Copy link
Contributor

alamb commented Oct 21, 2022

The fact this is reported as an internal error is not ideal. I will change the error type

@alamb
Copy link
Contributor

alamb commented Oct 21, 2022

The root cause is that the arrow cast kernel doesn't support converting from timestamp --> int32 (as @waitingkuo mentions)

To fix this we çould extend the support support in https://github.com/apache/arrow-rs/blob/c7f7606/arrow/src/compute/kernels/cast.rs#L253-L254

@waitingkuo
Copy link
Contributor

@alamb i think int32 isn't large enough for microsecond scale.
casting timestamp[us] to integer returns number of microseconds from 1970, but 2^32-1 microseonds is just too small.

select to_timestamp_micros(4294967295);
+--------------------------------------+
| totimestampmicros(Int64(4294967295)) |
+--------------------------------------+
| 1970-01-01 01:11:34.967295           |
+--------------------------------------+
1 row in set. Query took 0.000 seconds.

@alamb
Copy link
Contributor

alamb commented Oct 21, 2022

So maybe a better error message is the best solution here

@ghost
Copy link
Author

ghost commented Oct 22, 2022

Thank you all.
I understand that CAST does not support timestamp to INT32/INT64.
The internal error message is misleading.

@ghost ghost closed this as completed Oct 22, 2022
@waitingkuo
Copy link
Contributor

@ike560 Cast does support timestamp to INT64, you can try CAST(tpep_pickup_datetime AS BIGINT)

@ghost
Copy link
Author

ghost commented Oct 25, 2022

Thanks @waitingkuo
I tried it, CAST to BIGINT was OK.

@ghost
Copy link
Author

ghost commented Nov 10, 2022

Close

@ghost ghost reopened this Nov 10, 2022
@alamb alamb closed this as completed Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants