-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parquet pruning when column names have periods #5710
Conversation
@@ -384,7 +384,7 @@ fn build_statistics_record_batch<S: PruningStatistics>( | |||
let mut arrays = Vec::<ArrayRef>::new(); | |||
// For each needed statistics column: | |||
for (column, statistics_type, stat_field) in required_columns.iter() { | |||
let column = Column::from_qualified_name(column.name()); | |||
let column = Column::from_name(column.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is the bugfix
The rest of the PR is a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// note the column name has a period in it! | ||
Field::new("service.name", service_name.data_type().clone(), true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Scenario::PeriodsInColumnNames, | ||
"SELECT \"name\", \"service.name\" FROM t WHERE \"service.name\" = 'frontend' AND \"name\" != 'HTTP GET / DISPATCH'", | ||
Some(0), | ||
Some(2), // prune out middle and last row group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit:
Some(2), // prune out middle and last row group | |
Some(2), // prune out middle and last row group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry -- missed that -- will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included in #5726
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should stop using strings for columns at all to prevent this type of confusion in the future. IIRC this isn't the first bug introduces by it.
For what it is worth, postgres uses all indexes (numbers) as I recall and that has its own set of horrible bugs (when the output indexes get mixed up). |
…ache#5710)" This reverts commit 74c3955.
Which issue does this PR close?
Closes #5708
Rationale for this change
We were seeing wrong results with parquet file that had columns with
.
in their namesWhat changes are included in this PR?
Bug fix + test
Are these changes tested?
yes
I also verified that this fixes the issue we saw upstream in IOx https://github.com/influxdata/influxdb_iox/issues/7225#issuecomment-1481809300
Are there any user-facing changes?
bug fix
cc @crepererum