Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.5: Update Spark to use planned Avro reads #11299

Merged
merged 3 commits into from
Oct 22, 2024

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Oct 10, 2024

Moving to the planned reader adds default value support. This is the same basic change as in #9366 and #11108.

@rdblue
Copy link
Contributor Author

rdblue commented Oct 11, 2024

Here are the benchmark results:

## main
Benchmark                                                         Mode  Cnt  Score   Error  Units
IcebergSourceFlatAvroDataReadBenchmark.readIceberg                  ss    5  8.883 ± 0.321   s/op
IcebergSourceFlatAvroDataReadBenchmark.readWithProjectionIceberg    ss    5  7.173 ± 0.254   s/op

## this PR
Benchmark                                                         Mode  Cnt  Score   Error  Units
IcebergSourceFlatAvroDataReadBenchmark.readIceberg                  ss    5  3.718 ± 0.177   s/op
IcebergSourceFlatAvroDataReadBenchmark.readWithProjectionIceberg    ss    5  3.777 ± 0.776   s/op

## main
Benchmark                                                           Mode  Cnt  Score   Error  Units
IcebergSourceNestedAvroDataReadBenchmark.readIceberg                  ss    5  2.616 ± 0.110   s/op
IcebergSourceNestedAvroDataReadBenchmark.readWithProjectionIceberg    ss    5  2.365 ± 0.037   s/op

## this PR
Benchmark                                                           Mode  Cnt  Score   Error  Units
IcebergSourceNestedAvroDataReadBenchmark.readIceberg                  ss    5  1.969 ± 0.079   s/op
IcebergSourceNestedAvroDataReadBenchmark.readWithProjectionIceberg    ss    5  1.637 ± 0.045   s/op

List<Pair<Integer, ValueReader<?>>> readPlan =
ValueReaders.buildReadPlan(expected, record, fieldReaders, idToConstant);

// TODO: should this pass expected so that struct.get can reuse containers?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This matches the current behavior. I thought it was odd that we don't reuse any containers.

@rdblue rdblue merged commit a198966 into apache:main Oct 22, 2024
49 checks passed
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
rdblue added a commit to rdblue/iceberg that referenced this pull request Jan 17, 2025
rdblue added a commit to rdblue/iceberg that referenced this pull request Jan 17, 2025
rdblue added a commit to rdblue/iceberg that referenced this pull request Jan 17, 2025
rdblue added a commit to rdblue/iceberg that referenced this pull request Jan 17, 2025
rdblue added a commit to rdblue/iceberg that referenced this pull request Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants