Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: View Metadata Builder #908

Merged
merged 12 commits into from
Feb 25, 2025
Merged

Conversation

c-thiel
Copy link
Collaborator

@c-thiel c-thiel commented Jan 23, 2025

This PR is not completely ready yet as I believe the current mechanism of view expiration is flawed.
I opened a PR in Java to demonstrate the problem and use for discussions:
apache/iceberg#12051

Feedback from anyone is welcome. I am not sure what the best solutions looks like.

Comment on lines +253 to +261
// ToDo Discuss: Similar to TableMetadata sort-order, Java does not add changes
// in this case. I prefer to add changes as the state of the builder is
// potentially mutated (`last_added_version_id`), thus we should record the change.
if self.last_added_version_id != Some(version_id) {
self.changes.push(ViewUpdate::AddViewVersion {
view_version: view_version.with_version_id(version_id),
});
self.last_added_version_id = Some(version_id);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java returns here:

https://github.com/apache/iceberg/blob/f6faa58dac57e03be6e02a43937ac7c15c770225/core/src/main/java/org/apache/iceberg/view/ViewMetadata.java#L279-L285

So when you do a rollback to an earlier version, the change isn't recorded in Java. Java will only update the last_added_version_id if the version is added in the builder. I would assume that the change will be recorded in Java as well. Because I think the addInBuilder will only be true if you add it twice. I think I'm okay with this, but let's check with @nastra to be sure 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the benefit of deviating from what we do on the Java side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the rust implementation is simply more correct semantically.
My baseline is: If we change the state of the builder, we must record this change.
Java has edge cases where this doesn't happen.

Comment on lines 344 to 348
// ToDo Discuss: In Java this uses `new_view_version.version_id()` instead.
// I believe 0 is more appropriate here, as the first version should always be 0.
// The TableMetadataBuilder also uses 0 for partition specs (and other `reuse_or_create_*` functions).
// Consistent behaviour between the two builders is desirable.
.unwrap_or(INITIAL_VIEW_VERSION_ID)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a requirement from the spec that would dictate whether version id needs to start at 0 or 1 but we actually have code in Java that interpret 1 as the initial view version id: https://github.com/apache/iceberg/blob/6f1517546aceee19e10f38e36df7b072267fe6db/api/src/main/java/org/apache/iceberg/view/ViewVersion.java#L68

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so we should default to 1 then too.
If Java depends on this, we might want to add it to the spec?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed initial view version id to 1 now.

Comment on lines +367 to +376
// ToDo Discuss: Java does not add changes in this case. I prefer to add changes
// as the state of the builder is potentially mutated (`last_added_schema_id`),
// thus we should record the change.
if self.last_added_schema_id != Some(schema_id) {
self.changes.push(ViewUpdate::AddSchema {
schema: schema.clone().with_schema_id(schema_id),
last_column_id: None,
});
self.last_added_schema_id = Some(schema_id);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - CC @Fokko , @nastra

let builder = builder_without_changes();
let v1 = new_view_version(0, 1, "select * from ns.tbl");

// ToDo Java: I think we should remove line 688 - 693 in Java TestVeiwMetadata.java as it does nothing.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @nastra

@c-thiel c-thiel changed the title WIP feat: View Metadata Builder feat: View Metadata Builder Jan 27, 2025
Fokko
Fokko previously approved these changes Feb 3, 2025
.get(VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED)
.map_or(
VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED_DEFAULT,
|value| value.to_lowercase() == "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this into some utility method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revisit properties for views and tables together.
I think I'll give it a shot next week.

Comment on lines +253 to +261
// ToDo Discuss: Similar to TableMetadata sort-order, Java does not add changes
// in this case. I prefer to add changes as the state of the builder is
// potentially mutated (`last_added_version_id`), thus we should record the change.
if self.last_added_version_id != Some(version_id) {
self.changes.push(ViewUpdate::AddViewVersion {
view_version: view_version.with_version_id(version_id),
});
self.last_added_version_id = Some(version_id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java returns here:

https://github.com/apache/iceberg/blob/f6faa58dac57e03be6e02a43937ac7c15c770225/core/src/main/java/org/apache/iceberg/view/ViewMetadata.java#L279-L285

So when you do a rollback to an earlier version, the change isn't recorded in Java. Java will only update the last_added_version_id if the version is added in the builder. I would assume that the change will be recorded in Java as well. Because I think the addInBuilder will only be true if you add it twice. I think I'm okay with this, but let's check with @nastra to be sure 👍

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks @c-thiel for working on this 🚀

@Fokko Fokko merged commit e294de7 into apache:main Feb 25, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants