Experience using the pipeline for the 1st time #41

SarahOuologuem · 2023-05-03T08:55:59Z

Hi,

there are some aspects I've noticed while using the QC+Preprocessing for the first time with a RNA+ATAC multiome dataset (filtered_feature_bc_matrix.h5 file):

Sample submission file: unclear to me what is meant by the cellranger "outs" folder in regards to the keys "cellranger" and "cellranger_multi". What files are expected to be in the outs folder? (The barcodes.tsv, genes.tsv and matrix.mtx f.ex.?)
- was unsure whether the folder containing the .h5 file (or cellranger outputs) needs to be named "outs"
Regarding the QC_mm gene lists: didn't know before running the pipeline that one has to provide a list & that it's not an option, as the documentation of the gene list formats states "...,the user can provide custom gene lists..."
Regarding the QC pipeline.yml file:
- wasn't sure how to specify the "score_genes" parameter & what "MarkersNeutro" is (-> MarkersNeutro is a group of genes in the provided gene list, right?)
- ATAC QC: did not know how to specify the "partner_rna" parameter for the multiome (RNA+ATAC) dataset, whether to set it as "True"/"False" etc; was not clear to me that this parameter needs to be left empty for my case + threw an error when trying to set "partner_rna" to the .h5 file of the RNA+ATAC data;
Regarding the output of the QC:
- The scatter plot of the "n_genes_by_counts x doublet_scores" was too small, couldn't see the distribution clearly (see attached)
- Filtering in the "Preprocessing" step of the pipeline: When wanting to filter genes by the number of cells they are expressed in (i.e. n_cells_by_counts) and the genes' total_counts, I wasn't able to decide on a cutoff because the QC produced no plots of the two metrics
- Violin plots of "n_genes_by_counts" and the number of molecules in each cell (total_counts) would be nice for the user to have to decide on cutoffs. I know a lot of people who used those violin plots (including me), Seurat's tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html also uses them
- I ran the QC multiple times for the same dataset. Somehow I only got suggested thresholds for the RNA in tsv files the first time that I ran the QC. The other times I ran the QC, I didn't get this output

crichgriffin · 2023-05-03T09:53:04Z

This is so great @SarahOuologuem, we're grateful for your efforts! We'll get working on these issues in the near future.

bio-la · 2023-11-24T18:38:42Z

these are all sorted in version 0.4! thank you @SarahOuologuem !

crichgriffin added the enhancement New feature or request label May 3, 2023

bio-la added the good first issue Good for newcomers label Oct 20, 2023

bio-la self-assigned this Oct 20, 2023

bio-la closed this as completed Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experience using the pipeline for the 1st time #41

Experience using the pipeline for the 1st time #41

SarahOuologuem commented May 3, 2023

crichgriffin commented May 3, 2023

bio-la commented Nov 24, 2023

Experience using the pipeline for the 1st time #41

Experience using the pipeline for the 1st time #41

Comments

SarahOuologuem commented May 3, 2023

crichgriffin commented May 3, 2023

bio-la commented Nov 24, 2023