DendrouLab · crichgriffin · Nov 10, 2023 · Sep 25, 2023 · Sep 25, 2023 · Sep 25, 2023
diff --git a/docs/setup_for_ingest.md b/docs/setup_for_ingest.md
@@ -3,13 +3,13 @@
 
 For ingest the minimum required columns are
 
-sample_id | gex_path | gex_filetype  
+sample_id | rna_path | rna_filetype  
 ----------|----------|-------------
 
 
 If you want to analyse other modalities, add columns to the input file
 
-- adt_path/adt_filetype
+- prot_path/prot_filetype
 - atac_path/atac_filetype
 - tcr_path/tcr_filetype
 - bcr_path/bcr_filetype
@@ -20,7 +20,7 @@ example at `resources/sample_file_ingest.txt`
 
 If giving a cellranger path, give the path folder containing all the cellranger outputs. Otherwise path should be the complete path to the file. 
 
-If you have cellranger outputs which have gex and adt within the same files, specify the same path in gex_path and adt_path
+If you have cellranger outputs which have rna and prot within the same files, specify the same path in rna_path and prot_path
 
 To include sample level metadata, you can add additional columns to the submission file
 e.g Tissue and Diagnoisis columns in `resources/sample_file_ingest.txt`
@@ -38,13 +38,13 @@ For each modality per sample, specify the value in the key column in the X_filet
 
 modality    |key       |description
 ------------|----------|----------
-gex/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
-gex/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
-gex/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
-gex/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
-gex/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
-gex/adt/atac|txt_matrix  | tab-delimited file (one per sample)
-gex/adt/atac|csv_matrix  | comma-delimited file (one per sample)
+rna/prot/atac|cellranger| the "outs" folder produced by **cellranger count**
+rna/prot/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
+rna/prot/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
+rna/prot/atac|hd5 | Read a generic .h5 (hdf5) file.
+rna/prot/atac|h5ad  | Anndata h5ad objects (one per sample)
+rna/prot/atac|txt_matrix  | tab-delimited file (one per sample)
+rna/prot/atac|csv_matrix  | comma-delimited file (one per sample)
 tcr/bcr     |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json.  produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
 tcr/bcr     |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
 tcr/bcr     |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)

diff --git a/docs/usage/sample_file_qc_mm.md b/docs/usage/sample_file_qc_mm.md
@@ -1,12 +1,10 @@
 Example sample submission file
 -----------------------------
 
-| sample_id | gex_path                           | gex_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
+| sample_id | rna_path                           | rna_filetype | prot_path                            | prot_filetype | tissue | diagnosis |
 |-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
-| Sample1   | Sample1_gex.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
+| Sample1   | Sample1_rna.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
 | Sample2   | cellranger_count/Sample2_GEX/outs/ | cellranger   | cellranger_count/Sample2_CITE/outs/ | cellranger   | pbmc   | diseased  |
 
 
 Download this file: [sample_file_qc_mm.txt](sample_file_qc_mm.txt)
-
-
diff --git a/docs/usage/sample_file_qc_mm.txt b/docs/usage/sample_file_qc_mm.txt
@@ -1,3 +1,3 @@
-sample_id	gex_path	gex_filetype	adt_path	adt_filetype	tissue	diagnosis
+sample_id	rna_path	rna_filetype	prot_path	prot_filetype	tissue	diagnosis
 Sample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthy
 Sample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
diff --git a/docs/usage/setup_for_qc_mm.md b/docs/usage/setup_for_qc_mm.md
@@ -6,21 +6,21 @@ The multimodal QC pipeline (qc_mm) requires a sample submission file which it us
 
 The minimum required columns are
 
-sample_id | gex_path | gex_filetype  
+sample_id | rna_path | rna_filetype  
 ----------|----------|-------------
 
 
 
 If you want to analyse other modalities, add additional columns to the input file
 
-- adt_path/adt_filetype
+- prot_path/prot_filetype
 - atac_path/atac_filetype
 - tcr_path/tcr_filetype
 - bcr_path/bcr_filetype
 
 **sample id**: Each row must have a unique sample ID. 
 
-**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have gex and adt within the same files, specify the same path in gex_path and adt_path
+**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have rna and prot within the same files, specify the same path in rna_path and prot_path
 
 **{X}_filetype**: The "filetype" column tells panpipe how to read in the data. Panpipes supports a range of inputs. See the [supported input filetypes](#supported-input-filetypes) below to see the options for the {X}_filetype columns
 
@@ -35,7 +35,7 @@ You will also need to list which additional metadata columns you want to include
 ## Example sample submission file
 
 
-| sample_id | gex_path                           | gex_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
+| sample_id | rna_path                           | rna_filetype | prot_path                            | prot_filetype | tissue | diagnosis |
 |-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
 | Sample1   | Sample1_gex.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
 | Sample2   | cellranger_count/Sample2_GEX/outs/ | cellranger   | cellranger_count/Sample2_CITE/outs/ | cellranger   | pbmc   | diseased  |
@@ -58,13 +58,13 @@ For each modality per sample, specify the value in the key column in the X_filet
 
 modality    |key       |description
 ------------|----------|----------
-gex/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
-gex/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
-gex/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
-gex/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
-gex/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
-gex/adt/atac|txt_matrix  | tab-delimited file (one per sample)
-gex/adt/atac|csv_matrix  | comma-delimited file (one per sample)
+rna/prot/atac|cellranger| the "outs" folder produced by **cellranger count**
+rna/prot/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
+rna/prot/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
+rna/prot/atac|hd5 | Read a generic .h5 (hdf5) file.
+rna/prot/atac|h5ad  | Anndata h5ad objects (one per sample)
+rna/prot/atac|txt_matrix  | tab-delimited file (one per sample)
+rna/prot/atac|csv_matrix  | comma-delimited file (one per sample)
 tcr/bcr     |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json.  produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
 tcr/bcr     |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
 tcr/bcr     |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)

diff --git a/docs/workflows/integration.md b/docs/workflows/integration.md
@@ -5,7 +5,7 @@ The panpipes integration pipeline implements a variety of tools to batch correct
 
 ![integration_flowchart](../img/integration_coloured.drawio.png)
 
-The flowchart indicates which tools are available for each modality out of GEX (also referred to as RNA), ADT (also referred to as PROT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.
+The flowchart indicates which tools are available for each modality out of RNA (also referred to as GEX), PROT (also referred to as ADT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.
 
 
 ## Steps to run:

diff --git a/docs/workflows/pipeline_preprocess_preprint.yml b/docs/workflows/pipeline_preprocess_preprint.yml
@@ -157,11 +157,11 @@ plotqc:
   grouping_var: sample_id,orig.ident
   # use these continuous variables to plot gradients and distributions
   rna_metrics: pct_counts_mt,pct_counts_rp,pct_counts_hb,doublet_scores
-  prot_metrics: total_counts,log1p_total_counts,n_adt_by_counts
+  prot_metrics: total_counts,log1p_total_counts,n_prot_by_counts
   atac_metrics: total_counts
   rep_metrics: 
 
-
+# --------------------------------------------------------------------------------------------------------
 # RNA Normalisation
 # --------------------------------------------------------------------------------------------------------
 # hvg_flavour options include "seurat", "cell_ranger", "seurat_v3", default; "seurat"
@@ -213,8 +213,8 @@ pca:
   scree_n_pcs: 50
   color_by: sample_id
 
-
-# Protein (ADT) normalisation
+# --------------------------------------------------------------------------------------------------------
+# Protein (PROT) normalisation
 # --------------------------------------------------------------------------------------------------------
 prot:
   # comma separated string of normalisation options
@@ -229,7 +229,7 @@ prot:
 
   # CLR parameters:
   # margin determines whether you normalise per cell (as you would RNA norm), 
-  # or by feature (recommended, due to the variable nature of adts). 
+  # or by feature (recommended, due to the variable nature of prot assays). 
   # CLR margin 0 is recommended for informative qc plots in this pipeline
   # 0 = normalise colwise (per feature)
   # 1 = normalise rowwise (per cell)
@@ -250,9 +250,9 @@ prot:
 
   # do you want to save the prot normalised assay additionally as a txt file:
   save_norm_prot_mtx: False
-
+# --------------------------------------------------------------------------------------------------------
 # ATAC preprocessing and normalisation
-# --------------------------
+# --------------------------------------------------------------------------------------------------------
 atac:
   binarize: False
   normalize: TFIDF

diff --git a/docs/workflows/qc.md b/docs/workflows/qc.md
@@ -22,7 +22,7 @@ Then qc metrics are computed using [scanpy.pp.calculated_qc_metrics](https://sca
 - Protein metadata such as isotype status or shorter names incorporated into the object. Inputs for this are described [here]
 - Per cell QC metrics computed as described above, including pct_isotype where isotype information is available. 
 - Per Protein metrics, total_counts, and are computed, in order to compare the binding of different antibodies (applicable when your assay is CITE-seq based). These are defined in the yml:
-`prot_metrics_per_adt: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts`
+`plot_metrics_per_prot: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts`
 - A rudimentary check for cells that are 'isotype' outliers, i.e. the cells where the isotype content is in the top 10% quantile for more than 2 isotypes. (these parameters are customisable in the`pipeline.yml`). See the function [here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/funcs/scmethods.py#L328). 
 ```
 # isotype outliers: one way to determine which cells are very sticky is to work out which cells have the most isotype UMIs
@@ -63,7 +63,7 @@ There is an additional optional `assess_background` step, if the raw data (inclu
     [Inputs to Multimodal QC pipeline](../setup_for_qc_mm)
 2.  Generate qc genelists as described in
     [Gene list format](../gene_list_format)
-3.  For adt assay - generate the protein metadata file
+3.  For prot assay - generate the protein metadata file
     [example]((https://github.com/DendrouLab/panpipes/blob/main/resources/protein_metadata_w_iso.md)).
     This file is integrated into the mdata\['prot'\].var slot.
 4.  Generate config file (`panpipes ingest config`)