Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experience using the pipeline for the 1st time #41

Closed
SarahOuologuem opened this issue May 3, 2023 · 2 comments
Closed

Experience using the pipeline for the 1st time #41

SarahOuologuem opened this issue May 3, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@SarahOuologuem
Copy link
Collaborator

Hi,

there are some aspects I've noticed while using the QC+Preprocessing for the first time with a RNA+ATAC multiome dataset (filtered_feature_bc_matrix.h5 file):

  • Sample submission file: unclear to me what is meant by the cellranger "outs" folder in regards to the keys "cellranger" and "cellranger_multi". What files are expected to be in the outs folder? (The barcodes.tsv, genes.tsv and matrix.mtx f.ex.?)

    • was unsure whether the folder containing the .h5 file (or cellranger outputs) needs to be named "outs"
  • Regarding the QC_mm gene lists: didn't know before running the pipeline that one has to provide a list & that it's not an option, as the documentation of the gene list formats states "...,the user can provide custom gene lists..."

  • Regarding the QC pipeline.yml file:

    • wasn't sure how to specify the "score_genes" parameter & what "MarkersNeutro" is (-> MarkersNeutro is a group of genes in the provided gene list, right?)
    • ATAC QC: did not know how to specify the "partner_rna" parameter for the multiome (RNA+ATAC) dataset, whether to set it as "True"/"False" etc; was not clear to me that this parameter needs to be left empty for my case + threw an error when trying to set "partner_rna" to the .h5 file of the RNA+ATAC data;
  • Regarding the output of the QC:

    • The scatter plot of the "n_genes_by_counts x doublet_scores" was too small, couldn't see the distribution clearly (see attached)
    • Filtering in the "Preprocessing" step of the pipeline: When wanting to filter genes by the number of cells they are expressed in (i.e. n_cells_by_counts) and the genes' total_counts, I wasn't able to decide on a cutoff because the QC produced no plots of the two metrics
    • Violin plots of "n_genes_by_counts" and the number of molecules in each cell (total_counts) would be nice for the user to have to decide on cutoffs. I know a lot of people who used those violin plots (including me), Seurat's tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html also uses them
    • I ran the QC multiple times for the same dataset. Somehow I only got suggested thresholds for the RNA in tsv files the first time that I ran the QC. The other times I ran the QC, I didn't get this output

scatter_sample_id_rna-genes_rna-doublet_scores_rna-numi

@crichgriffin
Copy link
Contributor

This is so great @SarahOuologuem, we're grateful for your efforts! We'll get working on these issues in the near future.

@crichgriffin crichgriffin added the enhancement New feature or request label May 3, 2023
@bio-la bio-la added the good first issue Good for newcomers label Oct 20, 2023
@bio-la bio-la self-assigned this Oct 20, 2023
@bio-la
Copy link
Collaborator

bio-la commented Nov 24, 2023

these are all sorted in version 0.4! thank you @SarahOuologuem !

@bio-la bio-la closed this as completed Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants