Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review curation SoP #40

Open
dosumis opened this issue Nov 8, 2024 · 2 comments
Open

Review curation SoP #40

dosumis opened this issue Nov 8, 2024 · 2 comments

Comments

@dosumis
Copy link
Contributor

dosumis commented Nov 8, 2024

Ugur needs only:

  1. Something to identify dataset - currently h5ad link
  2. Author cell type fields

In future:
author cell type field present : T/F (update SOP - there should be a row with blank author cell type field(s)
To deal with version changes, need CxG Link

Editors need

  • At least one human readable title for dataset.

Details https://github.com/Cellular-Semantics/CL_KG/blob/main/docs/dataset_curation_guidelines.md

1. DataSet identification:

We have 7 fields:

Dataset (individual datasets within larger group):
Description: The specific name of the dataset being curated within a larger dataset group.
Example: "Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney - ATACseq"

Full name dataset (top of page):
Description: The full descriptive name of the dataset that should be used for documentation and display.
Example: "Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney"

CxG Link:
Description: The CellxGene link to access the dataset.
Example: "https://cellxgene.cziscience.com/e/13a027de-ea3e-432b-9a5e-6bc7048498fc.cxg/"

h5ad link:
Description: The direct link to the .h5ad data file of the dataset.
Example: "https://datasets.cellxgene.cziscience.com/dabd979f-cc50-4526-81f3-8bc6c673ca36.h5ad"

Reference_DOI:
Description: The DOI reference for the associated publication(s) for the dataset.
Example: "DOI: 10.1038/s41467-021-22368-w"

Study Short Name:
Description: The shortened name or acronym of the study associated with the dataset.
Example: "Muto et al. (2021) Nat Commun"

CxG Dataset Collection X:
Description: The CellxGene link to the collection where the dataset is stored.
Example: "https://cellxgene.cziscience.com/collections/9b02383a-9358-4f0f-9795-a891ec523bcc"

Do we need them all?

Use cases:

  • Curators need something readable to work with to know what they've curated
  • Ugur needs some specific key to look up. It may also be useful to have 2 keys, one human readable one not, to cross check. Right now he is using the h5ad link only. h5ad link is sensitive to version and might change.
  • A record of what's been curated (although this can also be generated from reports)
  • Loading dataset where there is no author cell type category but there is CL annotation

Not needed:

  • DOI not needed as can get it from CxG link
  • study short name
    CxG Dataset Collection <-- keep because helpful in debugging

h5ad should be the latest version which may not be the same as the CxG dataset link.

content

Suggestion:

  • Remove 'content' column and include only 'author cell type field' column. Having another column with only the entry 'cell type' would be pointless.
@ubyndr
Copy link
Collaborator

ubyndr commented Mar 4, 2025

Hi @JABelfiore , could you please review this issue and let me know which fields are required from a development perspective? I plan to remove any unnecessary fields.

For development, I only require the following columns:

  • Content
  • CxG Link
  • Author Category Cell Type Field Name

@JABelfiore
Copy link
Collaborator

Hi @ubyndr I need the following: Dataset (individual dataset name), Full name dataset, author cell type field name, CxG dataset collection, is the dataset normal? and stage. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants