We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug I am trying to query parquet files in S3 from the CLI. Some work, and some do not.
To Reproduce
DataFusion CLI v12.0.0 ❯ create external table test stored as parquet location 's3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet'; ObjectStore(Generic { store: "S3", source: MissingLastModified })
However, if I download the file locally it works.
$ aws s3 cp "s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet" /tmp/yellow_tripdata_2022-06.parquet download: s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet to ../../../../../../tmp/yellow_tripdata_2022-06.parquet
ataFusion CLI v12.0.0 ❯ create external table test stored as parquet location '/tmp/yellow_tripdata_2022-06.parquet'; 0 rows in set. Query took 0.006 seconds. ❯ select * from test limit 10; +----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+ | VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee | +----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+ | 1 | 2022-06-01 00:25:41 | 2022-06-01 00:48:22 | 1 | 11 | 1 | N | 70 | 48 | 1 | 32 | 3 | 0.5 | 2 | 6.55 | 0.3 | 44.35 | 2.5 | 0 | | 1 | 2022-06-01 00:44:40 | 2022-06-01 01:01:48 | 1 | 4.2 | 1 | N | 170 | 226 | 1 | 14 | 3 | 0.5 | 0 | 0 | 0.3 | 17.8 | 2.5 | 0 | | 2 | 2022-06-01 00:23:07 | 2022-06-01 00:39:50 | 1 | 9.49 | 1 | N | 264 | 113 | 1 | 26 | 0.5 | 0.5 | 5 | 6.55 | 0.3 | 42.6 | 2.5 | 1.25 | | 1 | 2022-06-01 00:25:53 | 2022-06-01 00:57:06 | 2 | 12.1 | 1 | N | 132 | 17 | 2 | 37 | 1.75 | 0.5 | 0 | 0 | 0.3 | 39.55 | 0 | 1.25 | | 1 | 2022-06-01 00:23:58 | 2022-06-01 00:33:43 | 0 | 1.8 | 1 | N | 140 | 163 | 1 | 9 | 3 | 0.5 | 2.55 | 0 | 0.3 | 15.35 | 2.5 | 0 | | 2 | 2022-06-01 00:01:27 | 2022-06-01 00:10:53 | 1 | 2.02 | 1 | N | 148 | 158 | 1 | 9 | 0.5 | 0.5 | 0.64 | 0 | 0.3 | 13.44 | 2.5 | 0 | | 2 | 2022-06-01 00:16:25 | 2022-06-01 00:40:45 | 1 | 8.08 | 1 | N | 158 | 116 | 1 | 26.5 | 0.5 | 0.5 | 7.58 | 0 | 0.3 | 37.88 | 2.5 | 0 | | 1 | 2022-06-01 00:11:08 | 2022-06-01 00:27:02 | 1 | 4.3 | 1 | N | 246 | 262 | 1 | 15 | 3 | 0.5 | 3.75 | 0 | 0.3 | 22.55 | 2.5 | 0 | | 2 | 2022-06-01 00:21:42 | 2022-06-01 00:42:01 | 1 | 8.78 | 1 | N | 197 | 191 | 1 | 26.5 | 0.5 | 0.5 | 5.56 | 0 | 0.3 | 33.36 | 0 | 0 | | 2 | 2022-06-01 00:23:05 | 2022-06-01 00:30:45 | 1 | 1.76 | 1 | N | 48 | 186 | 1 | 7.5 | 0.5 | 0.5 | 2.26 | 0 | 0.3 | 13.56 | 2.5 | 0 | +----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+ 10 rows in set. Query took 1.792 seconds.
Expected behavior Should work
Additional context None
The text was updated successfully, but these errors were encountered:
If I move the file into my own bucket then I can query it, so this seems to be an issue with authentication.
Sorry, something went wrong.
root cause is apache/arrow-rs#2795
This was resolved
No branches or pull requests
Describe the bug
I am trying to query parquet files in S3 from the CLI. Some work, and some do not.
To Reproduce
However, if I download the file locally it works.
$ aws s3 cp "s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet" /tmp/yellow_tripdata_2022-06.parquet download: s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet to ../../../../../../tmp/yellow_tripdata_2022-06.parquet
Expected behavior
Should work
Additional context
None
The text was updated successfully, but these errors were encountered: