Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2465: Fall back to HadoopConfig #1339

Merged
merged 1 commit into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -495,16 +495,17 @@ protected Builder(OutputFile path) {
/**
* @param conf a configuration
* @return an appropriate WriteSupport for the object model.
* @deprecated Use {@link #getWriteSupport(ParquetConfiguration)} instead
*/
@Deprecated
protected abstract WriteSupport<T> getWriteSupport(Configuration conf);

/**
* @param conf a configuration
* @return an appropriate WriteSupport for the object model.
*/
protected WriteSupport<T> getWriteSupport(ParquetConfiguration conf) {
throw new UnsupportedOperationException(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When people want to decouple from Hadoop, they can just override the methods, otherwise the old behavior is preserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a flag from ParquetConfiguration to determine if backward compatibility is required? By default we should enable backward compatibility and disable it in the parquet-mr 2.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of it, I believe we should deprecate the ones where you need to pass in Configuration (from Hadoop), and then we can move to ParquetConfiguration where you can also pass in a HadoopParquetConfiguration. Thanks for raising this, let me create a commit and let me know what you think!

"Override ParquetWriter$Builder#getWriteSupport(ParquetConfiguration)");
return getWriteSupport(ConfigurationUtil.createHadoopConfiguration(conf));
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,9 @@ public Map<String, String> getMergedKeyValueMetaData() {

/**
* @return the configuration for this job
* @deprecated Use {@link #getParquetConfiguration()} instead
*/
@Deprecated
public Configuration getConfiguration() {
return ConfigurationUtil.createHadoopConfiguration(configuration);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import java.util.Map;
import org.apache.hadoop.conf.Configuration;
import org.apache.parquet.conf.ParquetConfiguration;
import org.apache.parquet.hadoop.util.ConfigurationUtil;
import org.apache.parquet.io.api.RecordMaterializer;
import org.apache.parquet.schema.MessageType;
import org.apache.parquet.schema.MessageTypeParser;
Expand Down Expand Up @@ -76,12 +77,12 @@ public ReadContext init(Configuration configuration, Map<String, String> keyValu
* @param keyValueMetaData the app specific metadata from the file
* @param fileSchema the schema of the file
* @return the readContext that defines how to read the file
* @deprecated override {@link ReadSupport#init(InitContext)} instead
* @deprecated override {@link #init(InitContext)} instead
*/
@Deprecated
public ReadContext init(
ParquetConfiguration configuration, Map<String, String> keyValueMetaData, MessageType fileSchema) {
throw new UnsupportedOperationException("Override ReadSupport.init(InitContext)");
return init(ConfigurationUtil.createHadoopConfiguration(configuration), keyValueMetaData, fileSchema);
}

/**
Expand All @@ -103,7 +104,9 @@ public ReadContext init(InitContext context) {
* @param fileSchema the schema of the file
* @param readContext returned by the init method
* @return the recordMaterializer that will materialize the records
* @deprecated override {@link #prepareForRead(ParquetConfiguration,Map,MessageType,ReadContext)} instead
*/
@Deprecated
public abstract RecordMaterializer<T> prepareForRead(
Configuration configuration,
Map<String, String> keyValueMetaData,
Expand All @@ -125,8 +128,8 @@ public RecordMaterializer<T> prepareForRead(
Map<String, String> keyValueMetaData,
MessageType fileSchema,
ReadContext readContext) {
throw new UnsupportedOperationException(
"Override ReadSupport.prepareForRead(ParquetConfiguration, Map<String, String>, MessageType, ReadContext)");
return prepareForRead(
ConfigurationUtil.createHadoopConfiguration(configuration), keyValueMetaData, fileSchema, readContext);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import java.util.Objects;
import org.apache.hadoop.conf.Configuration;
import org.apache.parquet.conf.ParquetConfiguration;
import org.apache.parquet.hadoop.util.ConfigurationUtil;
import org.apache.parquet.io.api.RecordConsumer;
import org.apache.parquet.schema.MessageType;

Expand Down Expand Up @@ -99,7 +100,9 @@ public Map<String, String> getExtraMetaData() {
*
* @param configuration the job's configuration
* @return the information needed to write the file
* @deprecated override {@link #init(ParquetConfiguration)} instead
*/
@Deprecated
public abstract WriteContext init(Configuration configuration);

/**
Expand All @@ -109,7 +112,7 @@ public Map<String, String> getExtraMetaData() {
* @return the information needed to write the file
*/
public WriteContext init(ParquetConfiguration configuration) {
throw new UnsupportedOperationException("Override WriteSupport#init(ParquetConfiguration)");
return init(ConfigurationUtil.createHadoopConfiguration(configuration));
}

/**
Expand Down