-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Make parquet reader options Comet options instead of Hadoop options #968
Changes from 1 commit
ce603ea
0d396f7
02ca31c
575bd78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,6 +67,8 @@ | |
import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName; | ||
import org.apache.parquet.schema.Types; | ||
|
||
import org.apache.comet.CometConf; | ||
|
||
import static org.apache.parquet.column.Encoding.*; | ||
import static org.apache.parquet.format.converter.ParquetMetadataConverter.MAX_STATS_SIZE; | ||
import static org.junit.Assert.*; | ||
|
@@ -615,12 +617,12 @@ public void testColumnIndexReadWrite() throws Exception { | |
@Test | ||
public void testWriteReadMergeScanRange() throws Throwable { | ||
Configuration conf = new Configuration(); | ||
conf.set(ReadOptions.COMET_IO_MERGE_RANGES, Boolean.toString(true)); | ||
conf.set(CometConf.COMET_IO_MERGE_RANGES().key(), Boolean.toString(true)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test is setting the config values in a Hadoop Configuration still, and not setting them in the Spark config. Would it make sense to update the test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no spark context in this test. I've added a new test with the configuration set thru the spark config |
||
// Set the merge range delta so small that ranges do not get merged | ||
conf.set(ReadOptions.COMET_IO_MERGE_RANGES_DELTA, Integer.toString(1024)); | ||
conf.set(CometConf.COMET_IO_MERGE_RANGES_DELTA().key(), Integer.toString(1024)); | ||
testReadWrite(conf, 2, 1024); | ||
// Set the merge range delta so large that all ranges get merged | ||
conf.set(ReadOptions.COMET_IO_MERGE_RANGES_DELTA, Integer.toString(1024 * 1024)); | ||
conf.set(CometConf.COMET_IO_MERGE_RANGES_DELTA().key(), Integer.toString(1024 * 1024)); | ||
testReadWrite(conf, 2, 1024); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reading from the Hadoop conf. If I set the new configs on my Spark context, how would they get propagated to the Hadoop conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark sql copies over configs that are not from spark into the hadoop config when the sql context is created. There are other settings that also use this ( e.g.
COMET_USE_LAZY_MATERIALIZATION
,COMET_SCAN_PREFETCH_ENABLED
)