Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Support For Nullable Attributes #1836

Merged
merged 4 commits into from
Sep 28, 2023

Conversation

nguyenv
Copy link
Collaborator

@nguyenv nguyenv commented Sep 27, 2023

Background

As detailed in sc-34754, this fixes a bug found by a customer using the TileDB-SOMA Python API where the SOMADataFrame containing an enumerated nullable attribute was not being readback correctly. This highlights a larger deficit in the TileDB-Py codebase in which we have little support for writing nullable attributes outside of utilizing tiledb.from_pandas with Pandas's ExtensionDtype.

Changes

  • This PR supports writing Pyarrow arrays and Pandas dataframes that contain nullable values (pd.NA, pa.na, None, etc.).
  • Nullable attributes are now represented in Numpy as masked arrays.
  • PyQuery results now also return the validity buffer.
  • Note that in Pyarrow, the validity values represent 0 = invalid, 1 = valid, whereas in Numpy, this is inverted and mask values represent 0 = valid, 1 = invalid.

Future Proposals

  • Support writing numpy.ma for nullable attributes.
with tiledb.open(uri, "w') as A:
   A[:] = np.ma.array(data, mask)
  • Support writing with built-in sequences (eg. list, tuple). Internally, we check if the attribute .isnullable() and then cast using np.ma.masked_invalid().
with tiledb.open(uri, "w') as A:
   A[:] = [1, 2, None, 3]

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #34754: Not-in-enum handled incorrectly.

@nguyenv nguyenv changed the title Additional Support For Nullables Additional Support For Nullable Attributes Sep 27, 2023
@nguyenv nguyenv marked this pull request as ready for review September 27, 2023 22:59
@ihnorton ihnorton merged commit 656f54f into dev Sep 28, 2023
@ihnorton ihnorton deleted the viviannguyen/sc-34754/not-in-enum-handled-incorrectly branch September 28, 2023 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants