Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Fix enum regression in PR #3647 #3668

Merged

Conversation

nguyenv
Copy link
Member

@nguyenv nguyenv commented Feb 4, 2025

Issue and/or context:

[sc-62887]

Changes:

When evolving the schema, we need to re-open the array so that on subsequent writes it sees the latest version. Previously this was done in SOMAArray::write. Now that we are replacing this with ManagedQuery::write, the reopening step needs to be moved into this method. There should be no performance degradation as are simply moving the same step from SOMAArray::write to ManagedQuery::write.

Notes for Reviewer:

My original thought was to combine chunks in the Python side to avoid the repeated reopening steps for multiple chunks in the first place, but it sounds like the overhead of creating the copy is worse than reopening.

@johnkerl
Copy link
Member

johnkerl commented Feb 4, 2025

I know this PR is in draft so I won't review it -- just a comment while I'm here -- doing the combine_arrays may be a bit of a performance regression, as it's a data-copy. In my experience probably not a large regression since nominally obs is much smaller than X, and var typically even smaller still, so, a little slowdown on obs writes is (I think) unlikely to radically impact end-to-end experiment-write times. Food for thought, though. Especially if we combine your short-term combine_chunks approach here with a longer-term approach that will later allow us to go back and remove the combine_chunks workaround.

@bkmartinjr
Copy link
Member

sorry for piling on...I think combining the arrays (ie., not being zero copy) will definitely be noticed. IIRC, we added the chunk/batch iteration originally to avoid this overhead when doing large writes, e.g., for the Census builder.

@nguyenv nguyenv force-pushed the viviannguyen/sc-62887/python-c-enum-regression-in-pr-3647 branch from 557d143 to 40f486b Compare February 4, 2025 21:11
Copy link

codecov bot commented Feb 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.66%. Comparing base (ce0dc82) to head (40f486b).
Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3668       +/-   ##
===========================================
- Coverage   86.27%   65.66%   -20.62%     
===========================================
  Files          55      151       +96     
  Lines        6390    19265    +12875     
  Branches        0     1153     +1153     
===========================================
+ Hits         5513    12650     +7137     
- Misses        877     6220     +5343     
- Partials        0      395      +395     
Flag Coverage Δ
libtiledbsoma 55.40% <100.00%> (?)
python 86.32% <ø> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 86.32% <ø> (+0.04%) ⬆️
libtiledbsoma 48.25% <100.00%> (∅)

@nguyenv nguyenv marked this pull request as ready for review February 4, 2025 21:16
@nguyenv nguyenv merged commit 4013fa2 into main Feb 4, 2025
16 checks passed
@nguyenv nguyenv deleted the viviannguyen/sc-62887/python-c-enum-regression-in-pr-3647 branch February 4, 2025 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants