Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41246: [C++][Python] Simplify nested field encryption configuration #45462

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Feb 7, 2025

Rationale for this change

Columns can b encrypted with individual keys. For this, the column name have to be set in EncryptionConfiguration::column_keys. This poses the following challenges for columns with nested fields like MapType, ListType, and StructType:

  • Encrypting a column of such type requires providing an encryption key for all nested (leaf) fields. Ideally, the column name should be sufficient (as it is for any other data type) to encrypt all nested fields.
  • The actual name of nested fields is not obvious and intuitive from the Arrow schema of the table. An intuitive naming should be possible.

What changes are included in this PR?

This adds a user-friendly notation for nested fields:

  • Columns col.key and col.value can be used to reference they key and value nested field of a MapType column. Currently, col.key_value.key and col.key_value.value are required, respectively.
  • Columns col.element can be used to reference they individual list elements of a ListType column. Currently, col.list.element is required.
  • The actual column name can be used to encrypt all nested fields with the same encryption key.
  • The current column naming scheme can still be used for backward compatibility.

Are these changes tested?

Tested in C++ and Python.

Are there any user-facing changes?

Column encryption can be configured through simpler and intuitive naming.

Documentation will be extended once #45411 is merged.

Fixes #41246.

@EnricoMi EnricoMi requested a review from wgtmac as a code owner February 7, 2025 11:12
@EnricoMi EnricoMi changed the title GH-41246: [C++][Python] Simplify nested field encryption GH-41246: [C++][Python] Simplify nested field encryption configuration Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python][Parquet] Attempt to encrypt column of type 'list' produces OSError
1 participant