-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading a Delimited Files with extra columns #12186
Comments
Radosław Waśko reports a new STANDUP for yesterday (2025-02-03): Progress: Catching up and updating single-column Table PR after reviews. Experiment with PoC of JUnit tests for std-bits. Start on extra columns in Delimited - added new types and updated API. It should be finished by 2025-02-05. Next Day: Next day I will be working on the same task. Update tests, finish updating the logic. |
On the backlog meeting we have discussed that it will probably make more sense to rename the
The problem with the rename is that it would be a breaking change. We considered if we can keep the old name as an optional argument, but because
Because keeping the old field as a 'fallback' seems to cause more harm, we considered if we should move forward with the breaking change - renaming the field in a breaking way. @jdunkerley we wanted to hear your opinion on this, what do you think? The biggest trouble with this is that if there are any workflows that used the
which unfortunately is very confusing. This was already discussed in issue #7359 a long time ago, but due to the difficulty we reached only a partial solution. Perhaps this needs revisiting? |
As discussed, lets call it |
Radosław Waśko reports a new STANDUP for today (2025-02-04): Progress: Finished the implementation and updated tests. Discussing argument naming and breaking changes. Put up a Draft PR. It should be finished by 2025-02-05. Next Day: Next day I will be working on the same task. Rename the argument. Amend tests. Try issuing a more friendly error when old name is used. |
Radosław Waśko reports a new STANDUP for yesterday (2025-02-05): Progress: Renamed the argument and described in changelog. Got the Delimited extra columns PR merged. Created a prototype that can improve the error messages on such argument name changes or typos. It should be finished by 2025-02-05. Next Day: Next day I will be working on the #12136 task. Finish the prototype PR after review. Start work on key-pair auth for Snowflake. |
Currently, if reading a file with extra columns after the first row it is only possible to get a warning or drop the rows.
Want to change the
Delimited
argument forkeep_invalid_rows
to an atom type:Invalid_Rows.Drop_Invalid_Rows
Invalid_Rows.Keep_Invalid_Rows
Invalid_Rows.Add_Extra_Column
(adds new columns with default names and nulls above and below).In all cases we should still warn about it.
The text was updated successfully, but these errors were encountered: