-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable creating and inserting to empty external tables via SQL #7276
Conversation
.collect() | ||
.await?; | ||
|
||
let expected = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @devinjdangelo -- this code is both nicely written and well tested and I think moves DataFusion forward. We are still not quite there but we are so close
Thank you for making progress
|
||
/// tests insert into with end to end sql | ||
/// create external table + insert into statements | ||
async fn helper_test_insert_into_sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use sqllogictest to test this as well --
Perhaps add to https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/test_files/insert.slt
I can't remember how temporary directories work in sqllogictests though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually worked on this in #7283 which I just opened for copy.slt. Once those updates are in, I can cut another PR to add additional tests to insert.slt
fn create_external_table_parquet_sort_order() { | ||
let sql = "create external table foo(a varchar, b varchar, c timestamp) stored as parquet location '/tmp/foo' with order (c)"; | ||
let expected = "CreateExternalTable: Bare { table: \"foo\" }"; | ||
quick_test(sql, expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I tried to run this locally it doesn't quite work
❯ create external table t(x integer, y varchar) stored as parquet location '/tmp/foo' with order (x);
0 rows in set. Query took 0.001 seconds.
❯ select * from t;
0 rows in set. Query took 0.002 seconds.
❯ insert into t values (1, 'foo'), (2, 'bar');
This feature is not implemented: Writing to a sorted listing table via insert into is not supported yet. To write to this table in the meantime, register an equivalent table with file_sort_order = vec![]
But it is getting very close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that error I added intentionally. Inserting to a sorted table would work, but there is nothing to enforce that the sort order is preserved yet. So, my concern is a user inserts unsorted data to a sorted table, and then subsequent queries return incorrect results surprisingly.
We could add this as an issue to the streaming writes epic (support inserts to a sorted listingtable).
Which issue does this PR close?
Follow on to #7244
Related to #7228 and #7036
Note: does not close #7228 since it does not provide a mechanism to create external table target if it does not exist. It does offer a partial workaround since you can now specify
OPTIONS (insert_mode append_new_files)
when creating external tables soinsert into
will add new files to the table.Rationale for this change
Previous PRs adding support for inserting to
ListingTables
have not adequately tested an end to end process of creating an empty external table from SQL and subsequently inserting data into it. This PR adds test cases for this and adds a few enhancements / solves a few bugs to make the test cases pass.What changes are included in this PR?
ListingTableInsertMode
viaCreate External Table ... OPTIONS ()
statementAre these changes tested?
Yes
Are there any user-facing changes?
Creating empty external tables and inserting to them via SQL is possible now.