Review curation SoP #40

dosumis · 2024-11-08T14:52:06Z

Ugur needs only:

Something to identify dataset - currently h5ad link
Author cell type fields

In future:
author cell type field present : T/F (update SOP - there should be a row with blank author cell type field(s)
To deal with version changes, need CxG Link

Editors need

At least one human readable title for dataset.

Details https://github.com/Cellular-Semantics/CL_KG/blob/main/docs/dataset_curation_guidelines.md

1. DataSet identification:

We have 7 fields:

Dataset (individual datasets within larger group):
Description: The specific name of the dataset being curated within a larger dataset group.
Example: "Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney - ATACseq"

Full name dataset (top of page):
Description: The full descriptive name of the dataset that should be used for documentation and display.
Example: "Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney"

CxG Link:
Description: The CellxGene link to access the dataset.
Example: "https://cellxgene.cziscience.com/e/13a027de-ea3e-432b-9a5e-6bc7048498fc.cxg/"

h5ad link:
Description: The direct link to the .h5ad data file of the dataset.
Example: "https://datasets.cellxgene.cziscience.com/dabd979f-cc50-4526-81f3-8bc6c673ca36.h5ad"

Reference_DOI:
Description: The DOI reference for the associated publication(s) for the dataset.
Example: "DOI: 10.1038/s41467-021-22368-w"

Study Short Name:
Description: The shortened name or acronym of the study associated with the dataset.
Example: "Muto et al. (2021) Nat Commun"

CxG Dataset Collection X:
Description: The CellxGene link to the collection where the dataset is stored.
Example: "https://cellxgene.cziscience.com/collections/9b02383a-9358-4f0f-9795-a891ec523bcc"

Do we need them all?

Use cases:

Curators need something readable to work with to know what they've curated
Ugur needs some specific key to look up. It may also be useful to have 2 keys, one human readable one not, to cross check. Right now he is using the h5ad link only. h5ad link is sensitive to version and might change.
A record of what's been curated (although this can also be generated from reports)
Loading dataset where there is no author cell type category but there is CL annotation

Not needed:

DOI not needed as can get it from CxG link
study short name
CxG Dataset Collection <-- keep because helpful in debugging

h5ad should be the latest version which may not be the same as the CxG dataset link.

content

Suggestion:

Remove 'content' column and include only 'author cell type field' column. Having another column with only the entry 'cell type' would be pointless.

ubyndr · 2025-03-04T10:44:39Z

Hi @JABelfiore , could you please review this issue and let me know which fields are required from a development perspective? I plan to remove any unnecessary fields.

For development, I only require the following columns:

Content
CxG Link
Author Category Cell Type Field Name

JABelfiore · 2025-03-06T14:16:04Z

Hi @ubyndr I need the following: Dataset (individual dataset name), Full name dataset, author cell type field name, CxG dataset collection, is the dataset normal? and stage. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review curation SoP #40

Review curation SoP #40

dosumis commented Nov 8, 2024 •

edited

Loading

ubyndr commented Mar 4, 2025

JABelfiore commented Mar 6, 2025

Review curation SoP #40

Review curation SoP #40

Comments

dosumis commented Nov 8, 2024 • edited Loading

1. DataSet identification:

content

ubyndr commented Mar 4, 2025

JABelfiore commented Mar 6, 2025

dosumis commented Nov 8, 2024 •

edited

Loading