-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't share ConfigOptions (#3886) #4712
Conversation
@@ -302,6 +287,8 @@ impl ExecutionPlan for ParquetExec { | |||
}) | |||
})?; | |||
|
|||
let config_options = ctx.session_config().config_options(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fetch this at execution time, in order that datafusion-proto can still deserialize ParquetExec
without a SessionState
. Longer term as we strip out the overrides this will make more sense anyway so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to look at the session configuration while executing 🤷
It certainly seems better than the current state of master where the config options (attached to session state) are read via interior mutability
@@ -90,7 +90,8 @@ message CsvFormat { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -353,12 +353,9 @@ impl AsLogicalPlan for LogicalPlanNode { | |||
self | |||
)) | |||
})? { | |||
&FileFormatType::Parquet(protobuf::ParquetFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plumbing for this override was actually incorrect, it would convert false -> None
, the other overrides aren't present, and we plan to remove this override mechanism as part of #4349 so I just opted to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree serializing the same config options multiple times (once in the main session context and then once again as part of the file format) is undesirable for many reasons
b650b86
to
3327d11
Compare
3327d11
to
00a9b28
Compare
impl ParquetScanOptions { | ||
/// Returns a [`SessionConfig`] with the given options | ||
pub fn config(&self) -> SessionConfig { | ||
SessionConfig::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated simply removing ParquetScanOptions
in favour of SessionConfig
but figured this PR was large enough as it was
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I agree this PR is already large. I also think the ParquetScanOptions predated the config options.
I think removing the ParquetScanOptions as a follow on PR is a good idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👨🍳 👌
This looks really good @tustvold -- thank you for helping sort out the configuration situation
Pin<Box<dyn Stream<Item = Result<ActionType, Status>> + Send + Sync + 'static>>; | ||
type DoExchangeStream = | ||
Pin<Box<dyn Stream<Item = Result<FlightData, Status>> + Send + Sync + 'static>>; | ||
type HandshakeStream = BoxStream<'static, Result<HandshakeResponse, Status>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot this was here -- I have to give this example love to give this after my work to make arrow-flight easier to use
@@ -85,13 +84,9 @@ impl ParquetFormat { | |||
} | |||
|
|||
/// Return true if pruning is enabled | |||
pub fn enable_pruning(&self) -> bool { | |||
pub fn enable_pruning(&self, config_options: &ConfigOptions) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -1173,7 +1166,7 @@ pub struct SessionConfig { | |||
/// due to `resolve_table_ref` which passes back references) | |||
default_schema: String, | |||
/// Configuration options | |||
pub config_options: Arc<RwLock<ConfigOptions>>, | |||
config_options: ConfigOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -302,6 +287,8 @@ impl ExecutionPlan for ParquetExec { | |||
}) | |||
})?; | |||
|
|||
let config_options = ctx.session_config().config_options(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to look at the session configuration while executing 🤷
It certainly seems better than the current state of master where the config options (attached to session state) are read via interior mutability
CurrentDate=70; | ||
CurrentTime=71; | ||
Uuid=72; | ||
Abs = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace!
@@ -1132,6 +1133,9 @@ message ScanLimit { | |||
} | |||
|
|||
message FileScanExecConf { | |||
// Was repeated ConfigOption options = 10; | |||
reserved 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -353,12 +353,9 @@ impl AsLogicalPlan for LogicalPlanNode { | |||
self | |||
)) | |||
})? { | |||
&FileFormatType::Parquet(protobuf::ParquetFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree serializing the same config options multiple times (once in the main session context and then once again as part of the file format) is undesirable for many reasons
impl ParquetScanOptions { | ||
/// Returns a [`SessionConfig`] with the given options | ||
pub fn config(&self) -> SessionConfig { | ||
SessionConfig::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I agree this PR is already large. I also think the ParquetScanOptions predated the config options.
I think removing the ParquetScanOptions as a follow on PR is a good idea 👍
Co-authored-by: Andrew Lamb <[email protected]>
Benchmark runs are scheduled for baseline = afb1ae2 and contender = 07f4980. 07f4980 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3886
Closes #3909
Relates to #4349
Relates to #4617
Rationale for this change
Having shared mutable state makes reasoning about mutation difficult (#4617), the locking is verbose and potentially error prone (#3886),
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?