Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fc namescheck #115

Merged
merged 29 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions docs/setup_for_ingest.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@

For ingest the minimum required columns are

sample_id | gex_path | gex_filetype
sample_id | rna_path | rna_filetype
----------|----------|-------------


If you want to analyse other modalities, add columns to the input file

- adt_path/adt_filetype
- prot_path/prot_filetype
- atac_path/atac_filetype
- tcr_path/tcr_filetype
- bcr_path/bcr_filetype
Expand All @@ -20,7 +20,7 @@ example at `resources/sample_file_ingest.txt`

If giving a cellranger path, give the path folder containing all the cellranger outputs. Otherwise path should be the complete path to the file.

If you have cellranger outputs which have gex and adt within the same files, specify the same path in gex_path and adt_path
If you have cellranger outputs which have rna and prot within the same files, specify the same path in rna_path and prot_path

To include sample level metadata, you can add additional columns to the submission file
e.g Tissue and Diagnoisis columns in `resources/sample_file_ingest.txt`
Expand All @@ -38,13 +38,13 @@ For each modality per sample, specify the value in the key column in the X_filet

modality |key |description
------------|----------|----------
gex/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
gex/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
gex/adt/atac|10X_h5 | outs/filtered_feature_bc_matrix.h5 produced by cellranger
gex/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
gex/adt/atac|h5ad | Anndata h5ad objects (one per sample)
gex/adt/atac|txt_matrix | tab-delimited file (one per sample)
gex/adt/atac|csv_matrix | comma-delimited file (one per sample)
rna/prot/atac|cellranger| the "outs" folder produced by **cellranger count**
rna/prot/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
rna/prot/atac|10X_h5 | outs/filtered_feature_bc_matrix.h5 produced by cellranger
rna/prot/atac|hd5 | Read a generic .h5 (hdf5) file.
rna/prot/atac|h5ad | Anndata h5ad objects (one per sample)
rna/prot/atac|txt_matrix | tab-delimited file (one per sample)
rna/prot/atac|csv_matrix | comma-delimited file (one per sample)
tcr/bcr |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json. produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
tcr/bcr |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
tcr/bcr |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)
Expand Down
6 changes: 2 additions & 4 deletions docs/usage/sample_file_qc_mm.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
Example sample submission file
-----------------------------

| sample_id | gex_path | gex_filetype | adt_path | adt_filetype | tissue | diagnosis |
| sample_id | rna_path | rna_filetype | prot_path | prot_filetype | tissue | diagnosis |
|-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
| Sample1 | Sample1_gex.csv | csv_matrix | Sample1_adt.csv | csv_matrix | pbmc | healthy |
| Sample1 | Sample1_rna.csv | csv_matrix | Sample1_adt.csv | csv_matrix | pbmc | healthy |
| Sample2 | cellranger_count/Sample2_GEX/outs/ | cellranger | cellranger_count/Sample2_CITE/outs/ | cellranger | pbmc | diseased |


Download this file: [sample_file_qc_mm.txt](sample_file_qc_mm.txt)


2 changes: 1 addition & 1 deletion docs/usage/sample_file_qc_mm.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample_id gex_path gex_filetype adt_path adt_filetype tissue diagnosis
sample_id rna_path rna_filetype prot_path prot_filetype tissue diagnosis
Sample1 Sample1_gex.csv csv_matrix Sample1_adt.csv csv_matrix pbmc healthy
Sample2 cellranger_count/Sample2_GEX/outs/ cellranger cellranger_count/Sample2_CITE/outs/ cellranger pbmc diseased
22 changes: 11 additions & 11 deletions docs/usage/setup_for_qc_mm.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@ The multimodal QC pipeline (qc_mm) requires a sample submission file which it us

The minimum required columns are

sample_id | gex_path | gex_filetype
sample_id | rna_path | rna_filetype
----------|----------|-------------



If you want to analyse other modalities, add additional columns to the input file

- adt_path/adt_filetype
- prot_path/prot_filetype
- atac_path/atac_filetype
- tcr_path/tcr_filetype
- bcr_path/bcr_filetype

**sample id**: Each row must have a unique sample ID.

**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have gex and adt within the same files, specify the same path in gex_path and adt_path
**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have rna and prot within the same files, specify the same path in rna_path and prot_path

**{X}_filetype**: The "filetype" column tells panpipe how to read in the data. Panpipes supports a range of inputs. See the [supported input filetypes](#supported-input-filetypes) below to see the options for the {X}_filetype columns

Expand All @@ -35,7 +35,7 @@ You will also need to list which additional metadata columns you want to include
## Example sample submission file


| sample_id | gex_path | gex_filetype | adt_path | adt_filetype | tissue | diagnosis |
| sample_id | rna_path | rna_filetype | prot_path | prot_filetype | tissue | diagnosis |
|-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
| Sample1 | Sample1_gex.csv | csv_matrix | Sample1_adt.csv | csv_matrix | pbmc | healthy |
| Sample2 | cellranger_count/Sample2_GEX/outs/ | cellranger | cellranger_count/Sample2_CITE/outs/ | cellranger | pbmc | diseased |
Expand All @@ -58,13 +58,13 @@ For each modality per sample, specify the value in the key column in the X_filet

modality |key |description
------------|----------|----------
gex/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
gex/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
gex/adt/atac|10X_h5 | outs/filtered_feature_bc_matrix.h5 produced by cellranger
gex/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
gex/adt/atac|h5ad | Anndata h5ad objects (one per sample)
gex/adt/atac|txt_matrix | tab-delimited file (one per sample)
gex/adt/atac|csv_matrix | comma-delimited file (one per sample)
rna/prot/atac|cellranger| the "outs" folder produced by **cellranger count**
rna/prot/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
rna/prot/atac|10X_h5 | outs/filtered_feature_bc_matrix.h5 produced by cellranger
rna/prot/atac|hd5 | Read a generic .h5 (hdf5) file.
rna/prot/atac|h5ad | Anndata h5ad objects (one per sample)
rna/prot/atac|txt_matrix | tab-delimited file (one per sample)
rna/prot/atac|csv_matrix | comma-delimited file (one per sample)
tcr/bcr |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json. produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
tcr/bcr |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
tcr/bcr |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)
Expand Down
2 changes: 1 addition & 1 deletion docs/workflows/integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The panpipes integration pipeline implements a variety of tools to batch correct

![integration_flowchart](../img/integration_coloured.drawio.png)

The flowchart indicates which tools are available for each modality out of GEX (also referred to as RNA), ADT (also referred to as PROT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.
The flowchart indicates which tools are available for each modality out of RNA (also referred to as GEX), PROT (also referred to as ADT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.


## Steps to run:
Expand Down
14 changes: 7 additions & 7 deletions docs/workflows/pipeline_preprocess_preprint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,11 +157,11 @@ plotqc:
grouping_var: sample_id,orig.ident
# use these continuous variables to plot gradients and distributions
rna_metrics: pct_counts_mt,pct_counts_rp,pct_counts_hb,doublet_scores
prot_metrics: total_counts,log1p_total_counts,n_adt_by_counts
prot_metrics: total_counts,log1p_total_counts,n_prot_by_counts
atac_metrics: total_counts
rep_metrics:


# --------------------------------------------------------------------------------------------------------
# RNA Normalisation
# --------------------------------------------------------------------------------------------------------
# hvg_flavour options include "seurat", "cell_ranger", "seurat_v3", default; "seurat"
Expand Down Expand Up @@ -213,8 +213,8 @@ pca:
scree_n_pcs: 50
color_by: sample_id


# Protein (ADT) normalisation
# --------------------------------------------------------------------------------------------------------
# Protein (PROT) normalisation
# --------------------------------------------------------------------------------------------------------
prot:
# comma separated string of normalisation options
Expand All @@ -229,7 +229,7 @@ prot:

# CLR parameters:
# margin determines whether you normalise per cell (as you would RNA norm),
# or by feature (recommended, due to the variable nature of adts).
# or by feature (recommended, due to the variable nature of prot assays).
# CLR margin 0 is recommended for informative qc plots in this pipeline
# 0 = normalise colwise (per feature)
# 1 = normalise rowwise (per cell)
Expand All @@ -250,9 +250,9 @@ prot:

# do you want to save the prot normalised assay additionally as a txt file:
save_norm_prot_mtx: False

# --------------------------------------------------------------------------------------------------------
# ATAC preprocessing and normalisation
# --------------------------
# --------------------------------------------------------------------------------------------------------
atac:
binarize: False
normalize: TFIDF
Expand Down
4 changes: 2 additions & 2 deletions docs/workflows/qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Then qc metrics are computed using [scanpy.pp.calculated_qc_metrics](https://sca
- Protein metadata such as isotype status or shorter names incorporated into the object. Inputs for this are described [here]
- Per cell QC metrics computed as described above, including pct_isotype where isotype information is available.
- Per Protein metrics, total_counts, and are computed, in order to compare the binding of different antibodies (applicable when your assay is CITE-seq based). These are defined in the yml:
`prot_metrics_per_adt: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts`
`plot_metrics_per_prot: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts`
- A rudimentary check for cells that are 'isotype' outliers, i.e. the cells where the isotype content is in the top 10% quantile for more than 2 isotypes. (these parameters are customisable in the`pipeline.yml`). See the function [here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/funcs/scmethods.py#L328).
```
# isotype outliers: one way to determine which cells are very sticky is to work out which cells have the most isotype UMIs
Expand Down Expand Up @@ -63,7 +63,7 @@ There is an additional optional `assess_background` step, if the raw data (inclu
[Inputs to Multimodal QC pipeline](../setup_for_qc_mm)
2. Generate qc genelists as described in
[Gene list format](../gene_list_format)
3. For adt assay - generate the protein metadata file
3. For prot assay - generate the protein metadata file
[example]((https://github.com/DendrouLab/panpipes/blob/main/resources/protein_metadata_w_iso.md)).
This file is integrated into the mdata\['prot'\].var slot.
4. Generate config file (`panpipes ingest config`)
Expand Down
Loading