-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: remove duplication in create_writer
#7229
Conversation
@@ -494,56 +490,6 @@ impl CsvSink { | |||
file_compression_type, | |||
} | |||
} | |||
|
|||
// Create a write for Csv files | |||
async fn create_writer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole point of this PR is to move this into a free function that is shared rather than have two copies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
@@ -235,7 +239,7 @@ pub(crate) enum AbortMode { | |||
} | |||
|
|||
/// A wrapper struct with abort method and writer | |||
struct AbortableWrite<W: AsyncWrite + Unpin + Send> { | |||
pub(crate) struct AbortableWrite<W: AsyncWrite + Unpin + Send> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this module is getting pretty large -- I think it would be easier to navigate it if were split into some smaller modules (e.g. put the write related code into datafusion/core/src/datasource/file_format/write.rs
, for example). Any interest in doing so @devinjdangelo or @metesynnada ? If not, I can give it a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea. The code is likely to become more complex / long with additional helpers as we get into parallelization and dealing with parquet. I'm working on a PR to add support for inserting parquet now, and I can consolidate the shared write code into a new module as part of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a draft here #7244
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for catching and improving on this.
@@ -494,56 +490,6 @@ impl CsvSink { | |||
file_compression_type, | |||
} | |||
} | |||
|
|||
// Create a write for Csv files | |||
async fn create_writer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
@@ -235,7 +239,7 @@ pub(crate) enum AbortMode { | |||
} | |||
|
|||
/// A wrapper struct with abort method and writer | |||
struct AbortableWrite<W: AsyncWrite + Unpin + Send> { | |||
pub(crate) struct AbortableWrite<W: AsyncWrite + Unpin + Send> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea. The code is likely to become more complex / long with additional helpers as we get into parallelization and dealing with parquet. I'm working on a PR to add support for inserting parquet now, and I can consolidate the shared write code into a new module as part of that.
@@ -62,7 +62,7 @@ pub trait GetExt { | |||
} | |||
|
|||
/// Readable file compression type | |||
#[derive(Debug, Clone, PartialEq, Eq)] | |||
#[derive(Debug, Clone, Copy, PartialEq, Eq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reducing the duplication is logical, I just suggest an comment update.
Which issue does this PR close?
Closes #.
Rationale for this change
I noticed some code duplication while reviewing #7212 from @devinjdangelo and I think this code will be used more when adding write support for avro, parquet and arrow files. Thus it makes sense to put it in its own function
What changes are included in this PR?
Refactor
CsvSink::create_writer
andJsonSink::create_writer
Are these changes tested?
covered by existing tests
Are there any user-facing changes?
no