-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2465: Fall back to HadoopConfig #1339
Conversation
baa6245
to
17908f6
Compare
a416cef
to
6a1bd96
Compare
@@ -503,8 +503,7 @@ protected Builder(OutputFile path) { | |||
* @return an appropriate WriteSupport for the object model. | |||
*/ | |||
protected WriteSupport<T> getWriteSupport(ParquetConfiguration conf) { | |||
throw new UnsupportedOperationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When people want to decouple from Hadoop, they can just override the methods, otherwise the old behavior is preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use a flag from ParquetConfiguration
to determine if backward compatibility is required? By default we should enable backward compatibility and disable it in the parquet-mr 2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of it, I believe we should deprecate the ones where you need to pass in Configuration (from Hadoop), and then we can move to ParquetConfiguration
where you can also pass in a HadoopParquetConfiguration
. Thanks for raising this, let me create a commit and let me know what you think!
Add fallback logic We see that this causes the 1.14 to be incompatible with the previous releases. The config will be created and right after that the `getWriteSupport(conf)` is called. But since this method is freshly introduced: ```java protected WriteSupport<T> getWriteSupport(ParquetConfiguration conf) { throw new UnsupportedOperationException( "Override ParquetWriter$Builder#getWriteSupport(ParquetConfiguration)"); } ```
Good to go. @wgtmac @shangxinli @amousavigourabi @vinooganesh LMKWYT |
👍 this looks good to me, but do we want to actually mark the hadoop methods as deprecated if we are going to assume that parquet-mr 1.x will always rely on hadoop? Or is there actually a plan to drop the hadoop dependency on future 1.x releases? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix makes sense to me. Thanks!
You can still use the Hadoop config, but you'll need to wrap it into a |
Add fallback logic We see that this causes the 1.14 to be incompatible with the previous releases. The config will be created and right after that the `getWriteSupport(conf)` is called. But since this method is freshly introduced: ```java protected WriteSupport<T> getWriteSupport(ParquetConfiguration conf) { throw new UnsupportedOperationException( "Override ParquetWriter$Builder#getWriteSupport(ParquetConfiguration)"); } ```
Add fallback logic We see that this causes the 1.14 to be incompatible with the previous releases. The config will be created and right after that the `getWriteSupport(conf)` is called. But since this method is freshly introduced: ```java protected WriteSupport<T> getWriteSupport(ParquetConfiguration conf) { throw new UnsupportedOperationException( "Override ParquetWriter$Builder#getWriteSupport(ParquetConfiguration)"); } ```
I'd like to note that we have other stuff that is deprecated because they will be dropped in 2.0 without any plans to remove them in 1.x releases as well (see: |
Sounds great, thank @Fokko and @amousavigourabi! |
Add fallback logic We see that this causes the 1.14 to be incompatible with the previous releases. The config will be created and right after that the `getWriteSupport(conf)` is called. But since this method is freshly introduced: ```java protected WriteSupport<T> getWriteSupport(ParquetConfiguration conf) { throw new UnsupportedOperationException( "Override ParquetWriter$Builder#getWriteSupport(ParquetConfiguration)"); } ```
We see that this causes the 1.14 to be incompatible with the previous releases.
getWriteSupport(conf)
is called: https://github.com/apache/parquet-mr/pull/1141/files#diff-1ad34fccaed5421e95285fbcceaf63d9d55b8460b5fb301b65468f8a15603b5fR784It is not implemented and causes an
UnsupportedOperationException
if you don't supply a config.Jira
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
the ASF 3rd Party License Policy.
Tests
Commits
from "How to write a good git commit message":
Style
mvn spotless:apply -Pvector-plugins
Documentation