-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection #11254
Conversation
5581b30
to
4d42f18
Compare
4d42f18
to
6d26455
Compare
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
Outdated
Show resolved
Hide resolved
private final Map<PartitionSpec, List<DataFile>> newDataFilesBySpec = Maps.newHashMap(); | ||
private final DataFileSet newDataFiles = DataFileSet.create(); | ||
private final DeleteFileSet newDeleteFiles = DeleteFileSet.create(); | ||
private final Map<PartitionSpec, DataFileSet> newDataFilesBySpec = Maps.newHashMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... When did we start using PartitionSpec
as keys? This makes all operations more expensive. We always used Integer
when indexing by specs, like PartitionMap
or even newDeleteFilesBySpec
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this was introduced with #9860
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can follow up on this in a separate PR and change it to Map<Integer, DataFileSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think we should. Thanks!
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
Outdated
Show resolved
Hide resolved
bf02252
to
f26cb9b
Compare
f26cb9b
to
db770dc
Compare
RollingManifestWriter<DeleteFile> writer = newRollingDeleteManifestWriter(spec); | ||
|
||
try (RollingManifestWriter<DeleteFile> closableWriter = writer) { | ||
for (DeleteFileHolder file : files) { | ||
for (DeleteFile file : files) { | ||
Preconditions.checkState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should we use checkState
or checkArgument
? Also, any chance we can shorten the error message to stay on 1 line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way. My reasoning for using checkState
initially was that something internally must have gotten wrong to reach that state state but I guess it's also possible that some other internal code just passes delete files that aren't a PendingDeleteFile
.
Also getting this into a single line is hard, since the full line has 113 chars and I don't know what to omit from the error msg to make that fit into 100 chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, @nastra!
db770dc
to
371e472
Compare
thanks @aokolnychyi for the review |
…te data/delete file detection (apache#11254)
depends on #11158