-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: View Metadata Builder #908
Conversation
// ToDo Discuss: Similar to TableMetadata sort-order, Java does not add changes | ||
// in this case. I prefer to add changes as the state of the builder is | ||
// potentially mutated (`last_added_version_id`), thus we should record the change. | ||
if self.last_added_version_id != Some(version_id) { | ||
self.changes.push(ViewUpdate::AddViewVersion { | ||
view_version: view_version.with_version_id(version_id), | ||
}); | ||
self.last_added_version_id = Some(version_id); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java returns here:
So when you do a rollback to an earlier version, the change isn't recorded in Java. Java will only update the last_added_version_id
if the version is added in the builder. I would assume that the change will be recorded in Java as well. Because I think the addInBuilder
will only be true if you add it twice. I think I'm okay with this, but let's check with @nastra to be sure 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be the benefit of deviating from what we do on the Java side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the rust implementation is simply more correct semantically.
My baseline is: If we change the state of the builder, we must record this change.
Java has edge cases where this doesn't happen.
// ToDo Discuss: In Java this uses `new_view_version.version_id()` instead. | ||
// I believe 0 is more appropriate here, as the first version should always be 0. | ||
// The TableMetadataBuilder also uses 0 for partition specs (and other `reuse_or_create_*` functions). | ||
// Consistent behaviour between the two builders is desirable. | ||
.unwrap_or(INITIAL_VIEW_VERSION_ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a requirement from the spec that would dictate whether version id needs to start at 0 or 1 but we actually have code in Java that interpret 1 as the initial view version id: https://github.com/apache/iceberg/blob/6f1517546aceee19e10f38e36df7b072267fe6db/api/src/main/java/org/apache/iceberg/view/ViewVersion.java#L68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so we should default to 1 then too.
If Java depends on this, we might want to add it to the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed initial view version id to 1 now.
// ToDo Discuss: Java does not add changes in this case. I prefer to add changes | ||
// as the state of the builder is potentially mutated (`last_added_schema_id`), | ||
// thus we should record the change. | ||
if self.last_added_schema_id != Some(schema_id) { | ||
self.changes.push(ViewUpdate::AddSchema { | ||
schema: schema.clone().with_schema_id(schema_id), | ||
last_column_id: None, | ||
}); | ||
self.last_added_schema_id = Some(schema_id); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let builder = builder_without_changes(); | ||
let v1 = new_view_version(0, 1, "select * from ns.tbl"); | ||
|
||
// ToDo Java: I think we should remove line 688 - 693 in Java TestVeiwMetadata.java as it does nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @nastra
.get(VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED) | ||
.map_or( | ||
VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED_DEFAULT, | ||
|value| value.to_lowercase() == "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move this into some utility method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should revisit properties for views and tables together.
I think I'll give it a shot next week.
// ToDo Discuss: Similar to TableMetadata sort-order, Java does not add changes | ||
// in this case. I prefer to add changes as the state of the builder is | ||
// potentially mutated (`last_added_version_id`), thus we should record the change. | ||
if self.last_added_version_id != Some(version_id) { | ||
self.changes.push(ViewUpdate::AddViewVersion { | ||
view_version: view_version.with_version_id(version_id), | ||
}); | ||
self.last_added_version_id = Some(version_id); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java returns here:
So when you do a rollback to an earlier version, the change isn't recorded in Java. Java will only update the last_added_version_id
if the version is added in the builder. I would assume that the change will be recorded in Java as well. Because I think the addInBuilder
will only be true if you add it twice. I think I'm okay with this, but let's check with @nastra to be sure 👍
Co-authored-by: Fokko Driesprong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks @c-thiel for working on this 🚀
This PR is not completely ready yet as I believe the current mechanism of view expiration is flawed.
I opened a PR in Java to demonstrate the problem and use for discussions:
apache/iceberg#12051
Feedback from anyone is welcome. I am not sure what the best solutions looks like.