Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for lists in obs #1923

Open
brianraymor opened this issue Mar 17, 2025 · 3 comments
Open

Add support for lists in obs #1923

brianraymor opened this issue Mar 17, 2025 · 3 comments

Comments

@brianraymor
Copy link

Please describe your wishes and possible alternatives to achieve the desired result.

There are two cases in CELLxGENE Discover that permit multiple values to be annotated in obs metadata fields:

  • ethnicity
  • disease

Currently, these must be modeled as either CSV row encoded strings (to allow a comma delimiter) or strings with a non-comma delimiter such as "||". Both are inelegant.

Would it be possible for AnnData to support a more natural data type such as lists?

@ilan-gold
Copy link
Contributor

Hi @brianraymor I am not sure I follow what you are saying here. Could you maybe provide a small code snippet of what you would like to see? I'm not sure I follow. For example, isn't ethnicity just a column in an obs pandas.DataFrame?

@brianraymor
Copy link
Author

When reviewing our use cases with @ivirshup, he shared this fragment for one approach:

import anndata as ad, pandas as pd, pyarrow as pa, numpy as np

obs = pd.DataFrame({
    "disease_ontology_term_id": pa.array([["MONDO:0004604","MONDO:0043004","MONDO:0800349","MONDO:1030008"]]),
    "disease_ontology_term_label": pa.array([["Hodgkin's lymphoma, lymphocytic-histiocytic predominance", "Weil's disease", "atrial fibrillation, familial, 16","mitral valve insufficiency"]]),
})

...

@ilan-gold
Copy link
Contributor

@ivirshup Can you comment? Does this mean the issue is solved? It seems to me that you actually would not want to split the comma here, or not? Maybe you can open a separate issue for arrow extension dtypes @ivirshup as you suggested?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants