-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hdf5r rownames rework #166
Conversation
…o into write-h5ad-categoricals * 'write-h5ad-categoricals' of github.com:scverse/scverseio: fix styling Update write_h5ad_categorical
Replace repeated code in individual writers
I think there was a reason we decided I didn't write any of the code for handling compression so I'm not sure about that. I could maybe look if needed but otherwise I don't really have an opinion there. For me, switching from {rhfd5} to {hdf5r} would be a pretty major change, particularly as it affects if we submit to CRAN/Bioconductor. I'm not entirely opposed but I would want to better understand the differences and pro/cons first. We could probably reach out to the maintainers to try and get the boolean attributes added to {rhdf5} if that's the motivation for switching. |
I asked about this and turns out we actually need to write an In the process I realised how we currently write boolean values anywhere is wrong and we need to replace it with the |
Closing in favour of #169 |
Hey @lazappi !
Since the lack of support for boolean attributes remains a blocking issue with rhdf5, I decided to give hdf5r a try.
The current PR seems to work a lot better, still has some issues when writing h5ad files with anndataR and then trying to read them out again in Python anndata. I need to do some experiments with writing the same h5ad file from anndataR and Python anndata and seeing where the differences lie. I'm starting to think that in our implementation of
hdf5_write_compressed
, thedtype
andspace
should not be guessed but instead be manually specified depending on whichwrite_h5ad_*
function it was called from.Luckily, our internal
read_*
/write_*
functions stayed pretty much the same since allrhdf5::*
could mostly be substituted with the correspondinghdf5r::*
functions.While making the changes, I was struggling with our decision to keep the
obs_names
/var_names
separate fromobs
andvar
, because they are stored inside theobs
andvar
and when making changes to theobs
andvar
the first thing we do is throw it away.By allowing the rownames of the
obs
andvar
to be theobs_names
andvar_names
, the code did get simplified a lot.There is currently an issue with the released version of hdf5r (hhoeflin/hdf5r#208) which was the cause of some of the strange errors in packages like MuDataSeurat. We already managed to fix the issue, but it still needs to be merged into the main branch and released.