From ff19deabdbdd6806533928704e9066a352a00231 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 11:46:04 +0200
Subject: [PATCH 01/28] checking gex on setup md files

---
 docs/setup_for_ingest.md        | 18 +++++++++---------
 docs/usage/sample_file_qc_mm.md |  6 ++----
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/docs/setup_for_ingest.md b/docs/setup_for_ingest.md
index 86db2324..030f7f91 100644
--- a/docs/setup_for_ingest.md
+++ b/docs/setup_for_ingest.md
@@ -3,7 +3,7 @@
 
 For ingest the minimum required columns are
 
-sample_id | gex_path | gex_filetype  
+sample_id | rna_path | rna_filetype  
 ----------|----------|-------------
 
 
@@ -20,7 +20,7 @@ example at `resources/sample_file_ingest.txt`
 
 If giving a cellranger path, give the path folder containing all the cellranger outputs. Otherwise path should be the complete path to the file. 
 
-If you have cellranger outputs which have gex and adt within the same files, specify the same path in gex_path and adt_path
+If you have cellranger outputs which have rna and adt within the same files, specify the same path in rna_path and adt_path
 
 To include sample level metadata, you can add additional columns to the submission file
 e.g Tissue and Diagnoisis columns in `resources/sample_file_ingest.txt`
@@ -38,13 +38,13 @@ For each modality per sample, specify the value in the key column in the X_filet
 
 modality    |key       |description
 ------------|----------|----------
-gex/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
-gex/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
-gex/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
-gex/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
-gex/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
-gex/adt/atac|txt_matrix  | tab-delimited file (one per sample)
-gex/adt/atac|csv_matrix  | comma-delimited file (one per sample)
+rna/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
+rna/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
+rna/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
+rna/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
+rna/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
+rna/adt/atac|txt_matrix  | tab-delimited file (one per sample)
+rna/adt/atac|csv_matrix  | comma-delimited file (one per sample)
 tcr/bcr     |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json.  produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
 tcr/bcr     |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
 tcr/bcr     |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)
diff --git a/docs/usage/sample_file_qc_mm.md b/docs/usage/sample_file_qc_mm.md
index 47ca20e7..cf4269bb 100644
--- a/docs/usage/sample_file_qc_mm.md
+++ b/docs/usage/sample_file_qc_mm.md
@@ -1,12 +1,10 @@
 Example sample submission file
 -----------------------------
 
-| sample_id | gex_path                           | gex_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
+| sample_id | rna_path                           | rna_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
 |-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
-| Sample1   | Sample1_gex.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
+| Sample1   | Sample1_rna.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
 | Sample2   | cellranger_count/Sample2_GEX/outs/ | cellranger   | cellranger_count/Sample2_CITE/outs/ | cellranger   | pbmc   | diseased  |
 
 
 Download this file: [sample_file_qc_mm.txt](sample_file_qc_mm.txt)
-
-

From 8fb3f31d25cec8b40572b2349f350a3076e7174e Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 11:47:04 +0200
Subject: [PATCH 02/28] sample qc mm fix

---
 docs/usage/sample_file_qc_mm.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/usage/sample_file_qc_mm.txt b/docs/usage/sample_file_qc_mm.txt
index 5cc13a1f..7e45675e 100644
--- a/docs/usage/sample_file_qc_mm.txt
+++ b/docs/usage/sample_file_qc_mm.txt
@@ -1,3 +1,3 @@
-sample_id	gex_path	gex_filetype	adt_path	adt_filetype	tissue	diagnosis
+sample_id	rna_path	rna_filetype	adt_path	adt_filetype	tissue	diagnosis
 Sample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthy
 Sample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file

From 5998619e5773a7979b6712c7ee0bdaae85a70234 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 11:48:33 +0200
Subject: [PATCH 03/28] gex setup qc mm

---
 docs/usage/setup_for_qc_mm.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/usage/setup_for_qc_mm.md b/docs/usage/setup_for_qc_mm.md
index ee32ffce..1f6df9ec 100644
--- a/docs/usage/setup_for_qc_mm.md
+++ b/docs/usage/setup_for_qc_mm.md
@@ -6,7 +6,7 @@ The multimodal QC pipeline (qc_mm) requires a sample submission file which it us
 
 The minimum required columns are
 
-sample_id | gex_path | gex_filetype  
+sample_id | rna_path | rna_filetype  
 ----------|----------|-------------
 
 
@@ -20,7 +20,7 @@ If you want to analyse other modalities, add additional columns to the input fil
 
 **sample id**: Each row must have a unique sample ID. 
 
-**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have gex and adt within the same files, specify the same path in gex_path and adt_path
+**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have rna and adt within the same files, specify the same path in rna_path and adt_path
 
 **{X}_filetype**: The "filetype" column tells panpipe how to read in the data. Panpipes supports a range of inputs. See the [supported input filetypes](#supported-input-filetypes) below to see the options for the {X}_filetype columns
 
@@ -35,7 +35,7 @@ You will also need to list which additional metadata columns you want to include
 ## Example sample submission file
 
 
-| sample_id | gex_path                           | gex_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
+| sample_id | rna_path                           | rna_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
 |-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
 | Sample1   | Sample1_gex.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
 | Sample2   | cellranger_count/Sample2_GEX/outs/ | cellranger   | cellranger_count/Sample2_CITE/outs/ | cellranger   | pbmc   | diseased  |
@@ -58,13 +58,13 @@ For each modality per sample, specify the value in the key column in the X_filet
 
 modality    |key       |description
 ------------|----------|----------
-gex/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
-gex/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
-gex/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
-gex/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
-gex/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
-gex/adt/atac|txt_matrix  | tab-delimited file (one per sample)
-gex/adt/atac|csv_matrix  | comma-delimited file (one per sample)
+rna/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
+rna/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
+rna/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
+rna/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
+rna/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
+rna/adt/atac|txt_matrix  | tab-delimited file (one per sample)
+rna/adt/atac|csv_matrix  | comma-delimited file (one per sample)
 tcr/bcr     |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json.  produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
 tcr/bcr     |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
 tcr/bcr     |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)

From 1935529a063f47a6e1ab048b8397538967e17e9d Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 11:57:27 +0200
Subject: [PATCH 04/28] gex_only scanpy

---
 panpipes/funcs/io.py | 47 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/panpipes/funcs/io.py b/panpipes/funcs/io.py
index 387713fd..e8777738 100644
--- a/panpipes/funcs/io.py
+++ b/panpipes/funcs/io.py
@@ -48,32 +48,32 @@ def gen_load_anndata_jobs(caf, load_raw=False, mode_dictionary = {}, load_prot_f
     Generate a load_adatas job for each line in submission.txt
     """
     for nn in range(0, caf.shape[0]):
-        if pd.isna(caf['gex_path'][nn]):
-                gex_path= None
-                gex_filetype=None
-        elif caf['gex_filetype'][nn]=="cellranger" and mode_dictionary["rna"]:
-            gex_path, gex_filetype = update_cellranger_col(caf['gex_path'][nn], raw=load_raw, method="count")
-        elif caf['gex_filetype'][nn]=="cellranger_multi" and mode_dictionary["rna"]:
-            gex_path, gex_filetype = update_cellranger_col(caf['gex_path'][nn], raw=load_raw, method="multi", 
+        if pd.isna(caf['rna_path'][nn]):
+                rna_path= None
+                rna_filetype=None
+        elif caf['rna_filetype'][nn]=="cellranger" and mode_dictionary["rna"]:
+            rna_path, rna_filetype = update_cellranger_col(caf['rna_path'][nn], raw=load_raw, method="count")
+        elif caf['rna_filetype'][nn]=="cellranger_multi" and mode_dictionary["rna"]:
+            rna_path, rna_filetype = update_cellranger_col(caf['rna_path'][nn], raw=load_raw, method="multi", 
                                                             sample_id=caf['sample_id'][nn])
         else:
-            gex_path, gex_filetype = caf[['gex_path', "gex_filetype"]].iloc[nn]
+            rna_path, rna_filetype = caf[['rna_path', "rna_filetype"]].iloc[nn]
             if load_raw:
-                gex_path = re.sub("filtered", "raw", gex_path)
+                rna_path = re.sub("filtered", "raw", rna_path)
         # manage the adt paths
         if ('adt_path' in caf.columns and mode_dictionary["prot"]):
             # check if its the same as the gex path (data in the same file)
             if pd.isna(caf['adt_path'][nn]):
                 adt_path= None
                 adt_filetype=None
-            elif caf['adt_path'][nn] == caf['gex_path'][nn]:
-                adt_path, adt_filetype = gex_path, gex_filetype
+            elif caf['adt_path'][nn] == caf['rna_path'][nn]:
+                adt_path, adt_filetype = rna_path, rna_filetype
             elif caf['adt_filetype'][nn]=="cellranger":
-                # we might want to load the raw here because we want to then subset by good gex barcodes, 
+                # we might want to load the raw here because we want to then subset by good gex (rna) barcodes, 
                 # this is why the load_prot_from_raw argument exists
                 adt_path, adt_filetype = update_cellranger_col(caf['adt_path'][nn], raw=load_prot_from_raw)
             elif caf['adt_filetype'][nn]=="cellranger_multi":
-                # celranger multi has the same prot and gex barcodes
+                # celranger multi has the same prot and gex (rna) barcodes
                 adt_path, adt_filetype = update_cellranger_col(caf['adt_path'][nn], raw=load_raw, method="multi", 
                                                                 sample_id=caf['sample_id'][nn])
             else:
@@ -131,9 +131,9 @@ def gen_load_anndata_jobs(caf, load_raw=False, mode_dictionary = {}, load_prot_f
         else:
             outfile = outfile + ".h5mu"
         sample_id = caf['sample_id'][nn]
-        yield gex_path, outfile, \
+        yield rna_path, outfile, \
               sample_id, \
-              gex_filetype,  \
+              rna_filetype,  \
               adt_path, adt_filetype, \
               tcr_path, tcr_filetype,  \
               bcr_path, bcr_filetype, \
@@ -259,9 +259,9 @@ def write_obs(mdata, output_prefix="./", output_suffix="_filtered_cell_metadata.
 def check_submission_file(caf):
     if "filetype" in caf.columns or "path" in caf.columns:
         raise ValueError("you appear to be using the old notation for the sample submission file, \
-        please update to use gex_path instead of path and gex_filetype instead of filetype")
+        please update to use rna_path instead of path and rna_filetype instead of filetype")
     # check for required cols
-    req_cols = ['sample_id', 'gex_path', 'gex_filetype']
+    req_cols = ['sample_id', 'rna_path', 'rna_filetype']
     for rc in req_cols:
         if rc not in caf.columns:
             raise ValueError("required column %s missing from submission file" % rc)
@@ -363,7 +363,7 @@ def scp_read_10x_mtx(filename: PathLike, library_keep=None, *args, **kwargs) ->
     expanded sc.read_10x_mtx to filter for the library 
     adapted from https://github.com/scverse/muon/blob/master/muon/_prot/io.py
     """
-    adata = read_10x_mtx(filename, gex_only=False, *args, **kwargs)
+    adata = read_10x_mtx(filename, gex_only=False, *args, **kwargs) #need to leave gex as this is scanpy's code
     logging.debug("filtering cellranger outputs to %s" % library_keep)
     if library_keep is not None:
         adata = adata[
@@ -482,20 +482,21 @@ def load_mdata_from_multiple_files(all_files_dict):
     Parameters
     ----------
     all_files_dict: dict
-        dictionary containing one key per assay out of [GEX, ADT, TCR, BCR, ATAC]
+        dictionary containing one key per assay out of [RNA, ADT, TCR, BCR, ATAC]
         and the values for each key is a list of file path and fie type
-        e.g.  {"GEX": [filepath, "filetype"],
+        e.g.  {"RNA": [filepath, "filetype"],
                "ADT: [file path, "filetype"]} 
-        Filetypes supported for gex/adt: ["cellranger", "h5ad", "csv_matrix", "txt_matrix", "10X_h5"],
+        Filetypes supported for RNA/adt: ["cellranger", "h5ad", "csv_matrix", "txt_matrix", "10X_h5"],
         Filetypes supported for atac (multiome preferred is 10X_h5) ["10X_h5","cellranger","h5ad"]
         Filetypes supported for rep: ["cellranger_vdj", "airr", "tracer", "bracer"  ] 
         See scirpy documentation for more information of repertoire input formats 
         https://scverse.org/scirpy/latest/api.html#module-scirpy.io
     """
     # convert names to match mudata conventions
-    # mudata_conventional_names={"GEX":"rna", "ADT":"prot", "TCR":"tcr", 
-    # "BCR":"bcr", "ATAC": "atac", "spatial":"spatial"}
+    # mudata_conventional_names
+    # {"GEX":"rna", "ADT":"prot", "TCR":"tcr", "BCR":"bcr", "ATAC": "atac", "spatial":"spatial"}
     # all_files_dict = {mudata_conventional_names[nm]: x  for (nm, x) in all_files_dict.items()}
+    # note: scanpy's default function use gex_only as param so we need to leave that in
     logging.debug(all_files_dict.keys())
     # load in separate anndata for each expected modality
     data_dict = {}

From f543504a0de468552e78c7c7def826a58ff7bbdf Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 11:58:15 +0200
Subject: [PATCH 05/28] io fix

---
 panpipes/funcs/io.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/panpipes/funcs/io.py b/panpipes/funcs/io.py
index e8777738..833409a6 100644
--- a/panpipes/funcs/io.py
+++ b/panpipes/funcs/io.py
@@ -62,7 +62,7 @@ def gen_load_anndata_jobs(caf, load_raw=False, mode_dictionary = {}, load_prot_f
                 rna_path = re.sub("filtered", "raw", rna_path)
         # manage the adt paths
         if ('adt_path' in caf.columns and mode_dictionary["prot"]):
-            # check if its the same as the gex path (data in the same file)
+            # check if its the same as the rna path (data in the same file)
             if pd.isna(caf['adt_path'][nn]):
                 adt_path= None
                 adt_filetype=None

From 79c8fad01cffe374f2e28e52b6ca68a036d46436 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:00:18 +0200
Subject: [PATCH 06/28] rename

---
 .../python_scripts/aggregate_cellranger_summary_metrics.py  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py b/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py
index 480ea6dd..47530a0a 100644
--- a/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py
+++ b/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py
@@ -93,7 +93,7 @@ def get_all_unique_paths(pipe_df):
     all_paths.columns = ['sample_id', 'path_type', 'path']
     all_paths = all_paths.drop_duplicates()
     # rename path_type to match cellranger terminolgy
-    recode_dict = {'gex_path': "Gene Expression", 
+    recode_dict = {'rna_path': "Gene Expression", 
                    'adt_path': "Antibody Capture", 
                    'tcr_path': 'VDJ T', 
                    'bcr_path': 'VDJ B'}
@@ -216,11 +216,11 @@ def parse_10x_cellranger_count(path_df, convert_df,  path_col='metrics_summary_p
 
 ## gene expression  extra plots -----------------------------
 
-gex_tenx_metrics = tenx_metrics_full[tenx_metrics_full['library_type'] == 'Gene Expression']
+rna_tenx_metrics = tenx_metrics_full[tenx_metrics_full['library_type'] == 'Gene Expression']
 
 # sequencing_saturaion plot scatter plot
 plot_metrics = ['Sequencing saturation', 'Mean reads per cell', 'Estimated number of cells', 'Number of reads', 'Median UMI counts per cell']
-plt_df = gex_tenx_metrics[gex_tenx_metrics.metric_name.isin(plot_metrics)]
+plt_df = rna_tenx_metrics[rna_tenx_metrics.metric_name.isin(plot_metrics)]
 plt_df = plt_df[['sample_id', 'metric_name', 'metric_value']]
 plt_tab = plt_df.pivot_table(index='sample_id', columns='metric_name', values='metric_value', aggfunc=sum)
 # sns.scatterplot(data=plt_tab, 

From 26156e0214a3b7302a926b979db5710f5c7c8d76 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:01:08 +0200
Subject: [PATCH 07/28] small typo

---
 panpipes/python_scripts/assess_background.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/panpipes/python_scripts/assess_background.py b/panpipes/python_scripts/assess_background.py
index 106bafa9..086a8580 100644
--- a/panpipes/python_scripts/assess_background.py
+++ b/panpipes/python_scripts/assess_background.py
@@ -183,7 +183,7 @@
         plt.savefig(os.path.join(args.figpath,"barplot_background_" + args.channel_col + "_prot_top_expressed.png"))
 
 
-## QC for gex and protein fcomparing foreground and background
+## QC for rna and protein comparing foreground and background
 
 if 'rna' in mdata.mod.keys():
     pnp.pl.scatter_fg_vs_bg(mdata, mdata_bg,x="rna:log1p_n_genes_by_counts", y="rna:log1p_total_counts", facet_row=args.channel_col)

From 1fd759a74a76498345b851825334fe35541ea3ff Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:03:32 +0200
Subject: [PATCH 08/28] qc message

---
 panpipes/python_scripts/run_scanpyQC_rna.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/panpipes/python_scripts/run_scanpyQC_rna.py b/panpipes/python_scripts/run_scanpyQC_rna.py
index 9947e98d..d47972ba 100644
--- a/panpipes/python_scripts/run_scanpyQC_rna.py
+++ b/panpipes/python_scripts/run_scanpyQC_rna.py
@@ -1,7 +1,7 @@
 '''
-scanpy QC script GEX
+scanpy QC script RNA
 order of QC:
-- GEX
+- RNA
 - ADT
 - Repertoire
 - ATAC
@@ -62,7 +62,7 @@
 
 args, opt = parser.parse_known_args()
 
-L.info("Running scanpy gex qc pipeline")
+L.info("Running scanpy rna qc pipeline")
 
 sc.settings.verbosity = 3
 

From 8b28dfa9ba6b7c6e13249d0a86521134e2c4f08b Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:05:37 +0200
Subject: [PATCH 09/28] barplot_10x

---
 panpipes/R_scripts/produce_barplot_10xmetric.v3.R | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/panpipes/R_scripts/produce_barplot_10xmetric.v3.R b/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
index e1dde13a..59cffae1 100644
--- a/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
+++ b/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
@@ -54,8 +54,8 @@ project <- opt$project
 
 if(!is.null(opt$csvpaths)) {
   caf <- read.table(opt$csvpaths, header=T, sep="\t")
-  if(!"gex_path" %in% colnames(caf)){
-    caf$gex_path=caf$path
+  if(!"rna_path" %in% colnames(caf)){
+    caf$rna_path=caf$path
     caf <- caf %>% filter(filetype=="cellranger")
     if(nrow(caf)==0){
       message("no cellranger inputs")
@@ -64,7 +64,7 @@ if(!is.null(opt$csvpaths)) {
 
   }
   # paths <- gsub("filtered_feature_bc_matrix","metrics_summary.csv",caf$gex_path)
-  paths <- file.path(caf$gex_path, "metrics_summary.csv")
+  paths <- file.path(caf$rna_path, "metrics_summary.csv")
   
   # paths <- gsub("\\/$","", paths)
 
@@ -146,7 +146,7 @@ if(ncol(tag.mat)>0){
 }
 
 if(nrow(mat)==0){mat <- tag.mat}
-#modify this to deal with AB reads and GEX reads, need to split the knee plotting in 2 if you want both GEX and ADT knees in 2 different graphs
+#modify this to deal with AB reads and rna reads, need to split the knee plotting in 2 if you want both rna and ADT knees in 2 different graphs
 if(opt$kneeplot){
   print("plotting kneeplot")
   gsub("metrics_summary.csv","raw_feature_bc_matrix" , paths) ->normpath
@@ -155,7 +155,7 @@ if(opt$kneeplot){
   for( a in 1:length(normpath)){
     print(normpath[a])
     mm<-Read10X(normpath[a])
-    if(class(mm)=="list") { mm<-do.call("rbind",mm)} #modify this to deal with AB reads and GEX reads, need to split the knee plotting in 2 if you want both GEX and ADT in 2 different graphs
+    if(class(mm)=="list") { mm<-do.call("rbind",mm)} #modify this to deal with prot reads and rna reads, need to split the knee plotting in 2 if you want both rna and prot in 2 different graphs
     colnames(mm) <- paste0(colnames(mm),"_",rownames(mat)[a])
     xx <- CreateSeuratObject(mm,assay = "RNA",  min.cells = 0, min.features = 0)
     xx@meta.data %>%
@@ -200,7 +200,7 @@ if(opt$kneeplot){
       theme(legend.key.size = unit(0.2,"cm"), axis.text=element_text(size=13,face="bold", color="black"))  ->gb }
   gb + fl$layers ->gb
 
-  ggsave(gb,height = 15 ,width = 17, dpi=300, file=paste0(opt$figdir,"10xMetrics_",project,"_gex_knee_plot.png"), type="cairo-png")
+  ggsave(gb,height = 15 ,width = 17, dpi=300, file=paste0(opt$figdir,"10xMetrics_",project,"_rna_knee_plot.png"), type="cairo-png")
 
 
 }

From 2ff238575c9d8a18bb7b312f63ad329157f31919 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:08:17 +0200
Subject: [PATCH 10/28] caf files

---
 panpipes/resources/sample_file_qc_mm.txt          | 4 +++-
 panpipes/resources/sample_file_qc_mm_gex_only.txt | 4 +++-
 tests/sample_caf.tsv                              | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/panpipes/resources/sample_file_qc_mm.txt b/panpipes/resources/sample_file_qc_mm.txt
index beae3718..7e45675e 100644
--- a/panpipes/resources/sample_file_qc_mm.txt
+++ b/panpipes/resources/sample_file_qc_mm.txt
@@ -1 +1,3 @@
-sample_id	gex_path	gex_filetype	adt_path	adt_filetype	tissue	diagnosisSample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthySample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file
+sample_id	rna_path	rna_filetype	adt_path	adt_filetype	tissue	diagnosis
+Sample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthy
+Sample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file
diff --git a/panpipes/resources/sample_file_qc_mm_gex_only.txt b/panpipes/resources/sample_file_qc_mm_gex_only.txt
index beae3718..7e45675e 100644
--- a/panpipes/resources/sample_file_qc_mm_gex_only.txt
+++ b/panpipes/resources/sample_file_qc_mm_gex_only.txt
@@ -1 +1,3 @@
-sample_id	gex_path	gex_filetype	adt_path	adt_filetype	tissue	diagnosisSample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthySample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file
+sample_id	rna_path	rna_filetype	adt_path	adt_filetype	tissue	diagnosis
+Sample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthy
+Sample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file
diff --git a/tests/sample_caf.tsv b/tests/sample_caf.tsv
index fbece7be..cdcd773b 100644
--- a/tests/sample_caf.tsv
+++ b/tests/sample_caf.tsv
@@ -1,3 +1,3 @@
-sample_id	gex_path	gex_filetype	adt_path	adt_filetype
+sample_id	rna_path	rna_filetype	adt_path	adt_filetype
 sample1	../sample_1_rna	cellranger	../sample_1_prot	cellranger
 sample2	../sample_2_rna	cellranger	../sample_2_prot	cellranger

From 5dce3b506d12e7b838f3837453a0006f63bbf997 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:10:09 +0200
Subject: [PATCH 11/28] search adt

---
 docs/setup_for_ingest.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/setup_for_ingest.md b/docs/setup_for_ingest.md
index 030f7f91..b389b86b 100644
--- a/docs/setup_for_ingest.md
+++ b/docs/setup_for_ingest.md
@@ -9,7 +9,7 @@ sample_id | rna_path | rna_filetype
 
 If you want to analyse other modalities, add columns to the input file
 
-- adt_path/adt_filetype
+- prot_path/prot_filetype
 - atac_path/atac_filetype
 - tcr_path/tcr_filetype
 - bcr_path/bcr_filetype
@@ -20,7 +20,7 @@ example at `resources/sample_file_ingest.txt`
 
 If giving a cellranger path, give the path folder containing all the cellranger outputs. Otherwise path should be the complete path to the file. 
 
-If you have cellranger outputs which have rna and adt within the same files, specify the same path in rna_path and adt_path
+If you have cellranger outputs which have rna and prot within the same files, specify the same path in rna_path and prot_path
 
 To include sample level metadata, you can add additional columns to the submission file
 e.g Tissue and Diagnoisis columns in `resources/sample_file_ingest.txt`
@@ -38,13 +38,13 @@ For each modality per sample, specify the value in the key column in the X_filet
 
 modality    |key       |description
 ------------|----------|----------
-rna/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
-rna/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
-rna/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
-rna/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
-rna/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
-rna/adt/atac|txt_matrix  | tab-delimited file (one per sample)
-rna/adt/atac|csv_matrix  | comma-delimited file (one per sample)
+rna/prot/atac|cellranger| the "outs" folder produced by **cellranger count**
+rna/prot/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
+rna/prot/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
+rna/prot/atac|hd5 | Read a generic .h5 (hdf5) file.
+rna/prot/atac|h5ad  | Anndata h5ad objects (one per sample)
+rna/prot/atac|txt_matrix  | tab-delimited file (one per sample)
+rna/prot/atac|csv_matrix  | comma-delimited file (one per sample)
 tcr/bcr     |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json.  produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
 tcr/bcr     |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
 tcr/bcr     |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)

From 365abfd2de0a84b4ed950804d6819013c797fb0a Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:14:10 +0200
Subject: [PATCH 12/28] changing adt

---
 docs/usage/sample_file_qc_mm.md  |  2 +-
 docs/usage/sample_file_qc_mm.txt |  2 +-
 docs/usage/setup_for_qc_mm.md    | 20 ++++++++++----------
 docs/workflows/qc.md             |  4 ++--
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/docs/usage/sample_file_qc_mm.md b/docs/usage/sample_file_qc_mm.md
index cf4269bb..c6e5cc96 100644
--- a/docs/usage/sample_file_qc_mm.md
+++ b/docs/usage/sample_file_qc_mm.md
@@ -1,7 +1,7 @@
 Example sample submission file
 -----------------------------
 
-| sample_id | rna_path                           | rna_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
+| sample_id | rna_path                           | rna_filetype | prot_path                            | prot_filetype | tissue | diagnosis |
 |-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
 | Sample1   | Sample1_rna.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
 | Sample2   | cellranger_count/Sample2_GEX/outs/ | cellranger   | cellranger_count/Sample2_CITE/outs/ | cellranger   | pbmc   | diseased  |
diff --git a/docs/usage/sample_file_qc_mm.txt b/docs/usage/sample_file_qc_mm.txt
index 7e45675e..77775ce8 100644
--- a/docs/usage/sample_file_qc_mm.txt
+++ b/docs/usage/sample_file_qc_mm.txt
@@ -1,3 +1,3 @@
-sample_id	rna_path	rna_filetype	adt_path	adt_filetype	tissue	diagnosis
+sample_id	rna_path	rna_filetype	prot_path	prot_filetype	tissue	diagnosis
 Sample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthy
 Sample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file
diff --git a/docs/usage/setup_for_qc_mm.md b/docs/usage/setup_for_qc_mm.md
index 1f6df9ec..3f20695b 100644
--- a/docs/usage/setup_for_qc_mm.md
+++ b/docs/usage/setup_for_qc_mm.md
@@ -13,14 +13,14 @@ sample_id | rna_path | rna_filetype
 
 If you want to analyse other modalities, add additional columns to the input file
 
-- adt_path/adt_filetype
+- prot_path/prot_filetype
 - atac_path/atac_filetype
 - tcr_path/tcr_filetype
 - bcr_path/bcr_filetype
 
 **sample id**: Each row must have a unique sample ID. 
 
-**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have rna and adt within the same files, specify the same path in rna_path and adt_path
+**{X}_paths**: If giving a cellranger path, give the path folder containing all the cellranger outputs, known as the `outs` folder. Otherwise path should be the complete path to the file. If you have cellranger outputs which have rna and prot within the same files, specify the same path in rna_path and prot_path
 
 **{X}_filetype**: The "filetype" column tells panpipe how to read in the data. Panpipes supports a range of inputs. See the [supported input filetypes](#supported-input-filetypes) below to see the options for the {X}_filetype columns
 
@@ -35,7 +35,7 @@ You will also need to list which additional metadata columns you want to include
 ## Example sample submission file
 
 
-| sample_id | rna_path                           | rna_filetype | adt_path                            | adt_filetype | tissue | diagnosis |
+| sample_id | rna_path                           | rna_filetype | prot_path                            | prot_filetype | tissue | diagnosis |
 |-----------|------------------------------------|--------------|-------------------------------------|--------------|--------|-----------|
 | Sample1   | Sample1_gex.csv                    | csv_matrix   | Sample1_adt.csv                     | csv_matrix   | pbmc   | healthy   |
 | Sample2   | cellranger_count/Sample2_GEX/outs/ | cellranger   | cellranger_count/Sample2_CITE/outs/ | cellranger   | pbmc   | diseased  |
@@ -58,13 +58,13 @@ For each modality per sample, specify the value in the key column in the X_filet
 
 modality    |key       |description
 ------------|----------|----------
-rna/adt/atac|cellranger| the "outs" folder produced by **cellranger count**
-rna/adt/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
-rna/adt/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
-rna/adt/atac|hd5 | Read a generic .h5 (hdf5) file.
-rna/adt/atac|h5ad  | Anndata h5ad objects (one per sample)
-rna/adt/atac|txt_matrix  | tab-delimited file (one per sample)
-rna/adt/atac|csv_matrix  | comma-delimited file (one per sample)
+rna/prot/atac|cellranger| the "outs" folder produced by **cellranger count**
+rna/prot/atac|cellranger_multi| the "outs" folder produced by **cellranger multi**
+rna/prot/atac|10X_h5   | outs/filtered_feature_bc_matrix.h5 produced by cellranger
+rna/prot/atac|hd5 | Read a generic .h5 (hdf5) file.
+rna/prot/atac|h5ad  | Anndata h5ad objects (one per sample)
+rna/prot/atac|txt_matrix  | tab-delimited file (one per sample)
+rna/prot/atac|csv_matrix  | comma-delimited file (one per sample)
 tcr/bcr     |cellranger_vdj| Path to filtered_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json.  produced by **cellranger vdj** further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_10x_vdj.html)
 tcr/bcr     |tracer| data from [TraCeR](https://github.com/Teichlab/tracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_tracer.html)
 tcr/bcr     |bracer| data from [BraCeR](https://github.com/Teichlab/bracer) further [details](https://scverse.org/scirpy/latest/generated/scirpy.io.read_bracer.html)
diff --git a/docs/workflows/qc.md b/docs/workflows/qc.md
index cb298c4c..553e3efc 100644
--- a/docs/workflows/qc.md
+++ b/docs/workflows/qc.md
@@ -22,7 +22,7 @@ Then qc metrics are computed using [scanpy.pp.calculated_qc_metrics](https://sca
 - Protein metadata such as isotype status or shorter names incorporated into the object. Inputs for this are described [here]
 - Per cell QC metrics computed as described above, including pct_isotype where isotype information is available. 
 - Per Protein metrics, total_counts, and are computed, in order to compare the binding of different antibodies (applicable when your assay is CITE-seq based). These are defined in the yml:
-`prot_metrics_per_adt: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts`
+`plot_metrics_per_prot: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts`
 - A rudimentary check for cells that are 'isotype' outliers, i.e. the cells where the isotype content is in the top 10% quantile for more than 2 isotypes. (these parameters are customisable in the`pipeline.yml`). See the function [here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/funcs/scmethods.py#L328). 
 ```
 # isotype outliers: one way to determine which cells are very sticky is to work out which cells have the most isotype UMIs
@@ -63,7 +63,7 @@ There is an additional optional `assess_background` step, if the raw data (inclu
     [Inputs to Multimodal QC pipeline](../setup_for_qc_mm)
 2.  Generate qc genelists as described in
     [Gene list format](../gene_list_format)
-3.  For adt assay - generate the protein metadata file
+3.  For prot assay - generate the protein metadata file
     [example]((https://github.com/DendrouLab/panpipes/blob/main/resources/protein_metadata_w_iso.md)).
     This file is integrated into the mdata\['prot'\].var slot.
 4.  Generate config file (`panpipes ingest config`)

From 5e35a93273b5a95ba01399f26fe6408daeb5754a Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:16:00 +0200
Subject: [PATCH 13/28] fixes adt in io.py

---
 panpipes/funcs/io.py | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/panpipes/funcs/io.py b/panpipes/funcs/io.py
index 833409a6..7db749b4 100644
--- a/panpipes/funcs/io.py
+++ b/panpipes/funcs/io.py
@@ -60,29 +60,29 @@ def gen_load_anndata_jobs(caf, load_raw=False, mode_dictionary = {}, load_prot_f
             rna_path, rna_filetype = caf[['rna_path', "rna_filetype"]].iloc[nn]
             if load_raw:
                 rna_path = re.sub("filtered", "raw", rna_path)
-        # manage the adt paths
-        if ('adt_path' in caf.columns and mode_dictionary["prot"]):
+        # manage the prot paths
+        if ('prot_path' in caf.columns and mode_dictionary["prot"]):
             # check if its the same as the rna path (data in the same file)
-            if pd.isna(caf['adt_path'][nn]):
-                adt_path= None
-                adt_filetype=None
-            elif caf['adt_path'][nn] == caf['rna_path'][nn]:
-                adt_path, adt_filetype = rna_path, rna_filetype
-            elif caf['adt_filetype'][nn]=="cellranger":
+            if pd.isna(caf['prot_path'][nn]):
+                prot_path= None
+                prot_filetype=None
+            elif caf['prot_path'][nn] == caf['rna_path'][nn]:
+                prot_path, prot_filetype = rna_path, rna_filetype
+            elif caf['prot_filetype'][nn]=="cellranger":
                 # we might want to load the raw here because we want to then subset by good gex (rna) barcodes, 
                 # this is why the load_prot_from_raw argument exists
-                adt_path, adt_filetype = update_cellranger_col(caf['adt_path'][nn], raw=load_prot_from_raw)
-            elif caf['adt_filetype'][nn]=="cellranger_multi":
+                prot_path, prot_filetype = update_cellranger_col(caf['prot_path'][nn], raw=load_prot_from_raw)
+            elif caf['prot_filetype'][nn]=="cellranger_multi":
                 # celranger multi has the same prot and gex (rna) barcodes
-                adt_path, adt_filetype = update_cellranger_col(caf['adt_path'][nn], raw=load_raw, method="multi", 
+                prot_path, prot_filetype = update_cellranger_col(caf['prot_path'][nn], raw=load_raw, method="multi", 
                                                                 sample_id=caf['sample_id'][nn])
             else:
-                adt_path, adt_filetype = caf[['adt_path', "adt_filetype"]].iloc[nn]
+                prot_path, prot_filetype = caf[['prot_path', "prot_filetype"]].iloc[nn]
                 if load_prot_from_raw or load_raw:
-                    adt_path = re.sub("filtered", "raw", adt_path)
+                    prot_path = re.sub("filtered", "raw", prot_path)
         else:
-            adt_path= None
-            adt_filetype=None
+            prot_path= None
+            prot_filetype=None
         # load tcr_path
         if 'tcr_path' in caf.columns and mode_dictionary["tcr"] and pd.notna(caf['tcr_path'][nn]):
             tcr_path = caf['tcr_path'][nn]
@@ -134,7 +134,7 @@ def gen_load_anndata_jobs(caf, load_raw=False, mode_dictionary = {}, load_prot_f
         yield rna_path, outfile, \
               sample_id, \
               rna_filetype,  \
-              adt_path, adt_filetype, \
+              prot_path, prot_filetype, \
               tcr_path, tcr_filetype,  \
               bcr_path, bcr_filetype, \
               atac_path, atac_filetype, \
@@ -486,7 +486,7 @@ def load_mdata_from_multiple_files(all_files_dict):
         and the values for each key is a list of file path and fie type
         e.g.  {"RNA": [filepath, "filetype"],
                "ADT: [file path, "filetype"]} 
-        Filetypes supported for RNA/adt: ["cellranger", "h5ad", "csv_matrix", "txt_matrix", "10X_h5"],
+        Filetypes supported for RNA/prot: ["cellranger", "h5ad", "csv_matrix", "txt_matrix", "10X_h5"],
         Filetypes supported for atac (multiome preferred is 10X_h5) ["10X_h5","cellranger","h5ad"]
         Filetypes supported for rep: ["cellranger_vdj", "airr", "tracer", "bracer"  ] 
         See scirpy documentation for more information of repertoire input formats 
@@ -579,8 +579,8 @@ def write_10x_counts(adata, path, layer=None):
         features = adata.var.reset_index().rename(columns={"index": "gene_symbols"})
     elif "gene_symbols" in adata.var.columns:
         features = adata.var.reset_index().rename(columns={"index": "gene_ids"})
-        if "adt_id" in features.columns:
-            features = features.drop(columns=['gene_symbols']).rename(columns={"adt_id" : "gene_symbols"})
+        if "prot_id" in features.columns:
+            features = features.drop(columns=['gene_symbols']).rename(columns={"prot_id" : "gene_symbols"})
     features = features[['gene_ids','gene_symbols', 'feature_types']] 
     features.to_csv(os.path.join(path, "features.tsv.gz"), sep='\t', index=None, header=None)
 

From 5cb560205f1489eadbe6dc71f8b1626656fc1e37 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:17:47 +0200
Subject: [PATCH 14/28] fixes adt

---
 panpipes/funcs/processing.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/panpipes/funcs/processing.py b/panpipes/funcs/processing.py
index b1781a92..906dbc70 100644
--- a/panpipes/funcs/processing.py
+++ b/panpipes/funcs/processing.py
@@ -253,7 +253,7 @@ def merge_with_adata_obs(adata, df, on_col="sample_id", inplace=False):
         return new_df
 
 
-def add_var_mtd(prot, df, left_on="adt_id", right_on="adt_id"):
+def add_var_mtd(prot, df, left_on="prot_id", right_on="prot_id"):
     if not isinstance(prot, AnnData):
         raise TypeError("prot is not an AnnData object")
     if not isinstance(df, pd.DataFrame):
@@ -276,7 +276,7 @@ def add_var_mtd(prot, df, left_on="adt_id", right_on="adt_id"):
             prot.var.index = orig_index
             logging.debug(prot.var.head())
         else:
-            warnings.warn(UserWarning("prot.var is at risk of getting disrodered, not merging the df"))
+            warnings.warn(UserWarning("prot.var is at risk of getting disordered, not merging the df"))
     
 
From ecfb26f96df96b994d971e27e5a8b10015c3f477 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:23:44 +0200
Subject: [PATCH 15/28] adt check flow

---
 panpipes/funcs/scmethods.py                    |  2 +-
 panpipes/panpipes/pipeline_ingest.py           |  4 ++--
 panpipes/panpipes/pipeline_ingest/pipeline.yml | 16 ++++++++--------
 panpipes/python_scripts/run_dsb_clr.py         | 18 +++++++++---------
 4 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/panpipes/funcs/scmethods.py b/panpipes/funcs/scmethods.py
index d7213ad3..c9d1932e 100644
--- a/panpipes/funcs/scmethods.py
+++ b/panpipes/funcs/scmethods.py
@@ -413,7 +413,7 @@ def X_is_raw(adata):
 # run clr 
     
 
-def run_adt_normalise(mdata, 
+def run_prot_normalise(mdata, 
                       mdata_bg, 
                       method: Literal['dsb', 'clr'] = 'clr',
                       clr_margin: Literal[0, 1] = '0',
diff --git a/panpipes/panpipes/pipeline_ingest.py b/panpipes/panpipes/pipeline_ingest.py
index 2e714c4f..430c5b42 100644
--- a/panpipes/panpipes/pipeline_ingest.py
+++ b/panpipes/panpipes/pipeline_ingest.py
@@ -427,8 +427,8 @@ def run_scanpy_prot_qc(log_file, outfile, unfilt_file):
         cmd += " --channel_col sample_id"
     if PARAMS['prot_plotqc_metrics']:
         cmd += " --per_cell_metrics %(prot_plotqc_metrics)s"
-    if PARAMS['prot_metrics_per_adt']:
-        cmd += " --per_adt_metrics %(prot_metrics_per_adt)s"
+    if PARAMS['prot_metrics_per_prot']:
+        cmd += " --per_prot_metrics %(prot_metrics_per_prot)s"
     if PARAMS['identify_isotype_outliers']:
         cmd += " --identify_isotype_outliers %(identify_isotype_outliers)s"
         cmd += " --isotype_upper_quantile %(isotype_upper_quantile)s"
diff --git a/panpipes/panpipes/pipeline_ingest/pipeline.yml b/panpipes/panpipes/pipeline_ingest/pipeline.yml
index 221b0b26..e9ce97d7 100644
--- a/panpipes/panpipes/pipeline_ingest/pipeline.yml
+++ b/panpipes/panpipes/pipeline_ingest/pipeline.yml
@@ -87,20 +87,20 @@ barcode_mtd:
   metadatacols:  
 
 #---------------------------------------
-# Loading adt data - additional options
+# Loading prot data - additional options
 #---------------------------------------
 # As default the ingest will choose the first column of the cellranger features.tsv.gz
 # To merge extra information about the antibodies e.g. whether they are hashing antibodies or isotypes,
 # Create a table with the first column equivalent to the first column of cellrangers features.tsv.gz
 # and specify it below (save as txt file)
 protein_metadata_table: 
-# Make sure there are unique entries in this column for each adt.
+# Make sure there are unique entries in this column for each prot.
 # include a column called isotype (containing True or False) if you want to qc isotypes
 # include a column called hashing_ab (containing True or False) if your dataset contains hashing antibodies which you wish to split into a separate modality
 
 # If you want to update the mudata index with a column from protein_metadata_table, specify here
 # If there are overlaps with the rna gene symbols, then you might have trouble down the line
-# It is recommended to add something to make the labels unique e.g. "adt_CD4"
+# It is recommended to add something to make the labels unique e.g. "prot_CD4"
 index_col_choice: 
 
 # If providing separate prot and rna 10X outputs, then the pipeline will load the filtered rna 10X outputs and the raw prot 10X counts
@@ -200,16 +200,16 @@ plotqc_rna_metrics: doublet_scores,pct_counts_mt,pct_counts_rp,pct_counts_hb,pct
 ###  ADT QC metrics:
 # as standard the following metrics are calculated for prot data
 # per cell metrics:
-# total_counts,log1p_total_counts,n_adt_by_counts,log1p_n_adt_by_counts
+# total_counts,log1p_total_counts,n_prot_by_counts,log1p_n_prot_by_counts
 # if isotypes can be detected then the following are calculated also:
 # total_counts_isotype,pct_counts_isotype
 # choose which ones you want to plot here
-plotqc_prot_metrics: total_counts,log1p_total_counts,n_adt_by_counts,pct_counts_isotype
+plotqc_prot_metrics: total_counts,log1p_total_counts,n_prot_by_counts,pct_counts_isotype
 
 # per antibody metrics
 # n_cells_by_counts,mean_counts,log1p_mean_counts,pct_dropout_by_counts,total_counts,log1p_total_counts
 # choose which ones you want to plot here:
-prot_metrics_per_adt: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts
+plot_metrics_per_prot: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts
 
 # isotype outliers: one way to determine which cells are very sticky is to work out which cells have the most isotype UMIs
 # associated to them, to label a cell as an isotype outlier, it must meet or exceed the following crietria:
@@ -228,7 +228,7 @@ isotype_n_pass: 2
 # i.e. the 10X channel. 
 # this is usually the sample_id column (otherwise leave the next parameter blank)
 channel_col: sample_id
-# It is important to note that in ingest the per channel normalised adt scores are not saved in the mudata object
+# It is important to note that in ingest the per channel normalised prot scores are not saved in the mudata object
 #  this is because if you perform feature normalisation (clr normalisation margin 0 or dsb normalisation), 
 #  on subsets of cells then the normalised values cannot be simply concatenated. 
 # The ADT normalisation is rerun pn the complete object in the preprocess pipeline 
@@ -246,7 +246,7 @@ normalisation_methods: clr
 
 # CLR parameters:
 # margin determines whether you normalise per cell (as you would RNA norm), 
-# or by feature (recommended, due to the variable nature of adts). 
+# or by feature (recommended, due to the variable nature of prot assays). 
 # CLR margin 0 is recommended for informative qc plots in this pipeline
 # 0 = normalise colwise (per feature)
 # 1 = normalise rowwise (per cell)
diff --git a/panpipes/python_scripts/run_dsb_clr.py b/panpipes/python_scripts/run_dsb_clr.py
index 7fb0ffab..201231e1 100644
--- a/panpipes/python_scripts/run_dsb_clr.py
+++ b/panpipes/python_scripts/run_dsb_clr.py
@@ -35,7 +35,7 @@
                     default=False,
                     help="")    
 parser.add_argument("--figpath",
-                    default="./adt_figures",
+                    default="./prot_figures",
                     help="")  
 parser.add_argument("--save_mtx",
                     default=False,
@@ -106,7 +106,7 @@
         if 'clr' in norm_methods:
             # make sure to start from raw counts
             mdata["prot"].X = mdata["prot"].layers["raw_counts"]
-            pnp.scmethods.run_adt_normalise(mdata=mdata, mdata_raw=None,
+            pnp.scmethods.run_prot_normalise(mdata=mdata, mdata_raw=None,
                     method="clr",
                     clr_margin=int(args.clr_margin))
             L.info("saving ridgeplot")
@@ -117,7 +117,7 @@
                 plt.savefig(os.path.join(args.figpath, si + "_clr_ridgeplot_isotypes.png"))
             # save out the data in what format?
             if save_mtx:
-                pnp.io.write_10x_counts(mdata["prot"], os.path.join("adt_clr" , si ), layer="raw_counts")
+                pnp.io.write_10x_counts(mdata["prot"], os.path.join("prot_clr" , si ), layer="raw_counts")
         # then run dsb
         if 'dsb' in norm_methods:
             mdata_raw = mdata_raw_per_sample[si]
@@ -129,7 +129,7 @@
             # mu.pl.histogram(mdata_raw["rna"], ["log10umi"], bins=50)
             # plt.savefig(os.path.join(args.figpath, si + "_log10umi.png"))
             # run dsb
-            pnp.scmethods.run_adt_normalise(mdata=mdata, 
+            pnp.scmethods.run_prot_normalise(mdata=mdata, 
                                             mdata_raw=mdata_raw,
                                             method="dsb", 
                                             isotypes=isotypes)
@@ -139,7 +139,7 @@
                 pnp.plotting.ridgeplot(mdata["prot"], features=isotypes, layer="dsb",  splitplot=1)
                 plt.savefig(os.path.join(args.figpath, si + "_dsb_ridgeplot_isotypes.png"))
             if save_mtx:
-                pnp.io.write_10x_counts(mdata["prot"], os.path.join("adt_dsb" , si), layer="raw_counts")
+                pnp.io.write_10x_counts(mdata["prot"], os.path.join("prot_dsb" , si), layer="raw_counts")
 else:
     # run on all the data (not on a channel basis)
     # first run clr
@@ -151,7 +151,7 @@
         # make sure to start from raw counts
         all_mdata["prot"].X = all_mdata["prot"].layers["raw_counts"]
         # run normalise
-        pnp.scmethods.run_adt_normalise(mdata=all_mdata, 
+        pnp.scmethods.run_prot_normalise(mdata=all_mdata, 
                 mdata_raw=None,
                 method="clr",
                 clr_margin=int(args.clr_margin))
@@ -162,7 +162,7 @@
             plt.savefig(os.path.join(args.figpath, "clr_ridgeplot_isotypes.png"))
         # save out the data in what format?
         if save_mtx:
-            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("adt_clr"), layer="clr")
+            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("prot_clr"), layer="clr")
     if 'dsb' in norm_methods:
         # make sure to start from raw counts
         all_mdata["prot"].X = all_mdata["prot"].layers["raw_counts"]
@@ -171,7 +171,7 @@
         all_mdata_raw["rna"].obs["log10umi"] = np.array(np.log10(all_mdata_raw["rna"].X.sum(axis=1) + 1)).reshape(-1)
         mu.pl.histogram(all_mdata_raw["rna"], ["log10umi"], bins=50)
         plt.savefig(os.path.join(args.figpath, "all_log10umi.png"))
-        pnp.scmethods.run_adt_normalise(mdata=all_mdata, 
+        pnp.scmethods.run_prot_normalise(mdata=all_mdata, 
                 mdata_raw=all_mdata_raw, 
                 method="dsb",
                 isotypes=isotypes) 
@@ -187,7 +187,7 @@
             plt.savefig(os.path.join(args.figpath, "dsb_ridgeplot_isotypes.png"))
         # save out the data in what format?
         if save_mtx:
-            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("adt_dsb"), layer="dsb")
+            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("prot_dsb"), layer="dsb")
 
     # run pca on dsb normalised (if dsb was run otherwise clr is in X)
     sc.tl.pca(all_mdata['prot'], n_comps=50, svd_solver='arpack', random_state=0) 

From 2de137c48579696ee4cbaa79ed87485e2c63e5ea Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:33:23 +0200
Subject: [PATCH 16/28] multiple files adt prot

---
 .../pipeline_preprocess_preprint.yml          |  4 ++--
 panpipes/R_scripts/plotQC.R                   | 14 ++++++------
 .../pipeline_integration/pipeline.yml         |  4 ++--
 .../panpipes/pipeline_preprocess/pipeline.yml |  4 ++--
 .../aggregate_cellranger_summary_metrics.py   |  2 +-
 .../python_scripts/batch_correct_totalvi.py   | 12 +++++-----
 .../python_scripts/run_preprocess_prot.py     | 22 ++++++++++---------
 panpipes/python_scripts/run_scanpyQC_prot.py  | 22 +++++++++----------
 panpipes/resources/sample_file_qc_mm.txt      |  2 +-
 .../resources/sample_file_qc_mm_gex_only.txt  |  2 +-
 tests/sample_caf.tsv                          |  2 +-
 11 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/docs/workflows/pipeline_preprocess_preprint.yml b/docs/workflows/pipeline_preprocess_preprint.yml
index d985d7d1..8b1488da 100644
--- a/docs/workflows/pipeline_preprocess_preprint.yml
+++ b/docs/workflows/pipeline_preprocess_preprint.yml
@@ -157,7 +157,7 @@ plotqc:
   grouping_var: sample_id,orig.ident
   # use these continuous variables to plot gradients and distributions
   rna_metrics: pct_counts_mt,pct_counts_rp,pct_counts_hb,doublet_scores
-  prot_metrics: total_counts,log1p_total_counts,n_adt_by_counts
+  prot_metrics: total_counts,log1p_total_counts,n_prot_by_counts
   atac_metrics: total_counts
   rep_metrics: 
 
@@ -229,7 +229,7 @@ prot:
 
   # CLR parameters:
   # margin determines whether you normalise per cell (as you would RNA norm), 
-  # or by feature (recommended, due to the variable nature of adts). 
+  # or by feature (recommended, due to the variable nature of prot assays). 
   # CLR margin 0 is recommended for informative qc plots in this pipeline
   # 0 = normalise colwise (per feature)
   # 1 = normalise rowwise (per cell)
diff --git a/panpipes/R_scripts/plotQC.R b/panpipes/R_scripts/plotQC.R
index c29e3494..58d61665 100644
--- a/panpipes/R_scripts/plotQC.R
+++ b/panpipes/R_scripts/plotQC.R
@@ -238,7 +238,7 @@ if(!is.null(opt$prot_qc_metrics)){
   }
   
   # do the following plots:
-  # - prot:total_counts vs prot_nadt_by_counts
+  # - prot:total_counts vs prot_nprot_by_counts
   # - prot:total_counts vs prot:isotype_counts
   # - prot:log1p_total_counts vs prot:log1p_isotype_counts
   # - prot:total_counts vs prot:pct_isotype_counts
@@ -255,14 +255,14 @@ if(!is.null(opt$prot_qc_metrics)){
     }
     # plot once per source facet
     message("protein scatter plots")  
-    if (all(c("total_counts","n_adt_by_counts")%in% colnames(prot_data_plot))){
-      g <- do_scatter_plot(prot_data_plot,x="total_counts",y="n_adt_by_counts", facet=sc)
-      ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-nUMI_v_prot-adt.png")), type="cairo",
+    if (all(c("total_counts","n_prot_by_counts")%in% colnames(prot_data_plot))){
+      g <- do_scatter_plot(prot_data_plot,x="total_counts",y="n_prot_by_counts", facet=sc)
+      ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-nUMI_v_prot-prot.png")), type="cairo",
              width= 2*ncols, height=2*nrows, dpi=120)
     }
-    if (all(c("log1p_total_counts","log1p_n_adt_by_counts")%in% colnames(prot_data_plot))){
-      g <- do_scatter_plot(prot_data_plot,x="log1p_total_counts",y="log1p_n_adt_by_counts", facet=sc)
-      ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-log1p_nUMI_v_prot-log1p_adt.png")), type="cairo",
+    if (all(c("log1p_total_counts","log1p_n_prot_by_counts")%in% colnames(prot_data_plot))){
+      g <- do_scatter_plot(prot_data_plot,x="log1p_total_counts",y="log1p_n_prot_by_counts", facet=sc)
+      ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-log1p_nUMI_v_prot-log1p_prot.png")), type="cairo",
              width= 2*ncols, height=2*nrows, dpi=120)
     }
     if (all(c("total_counts","pct_counts_isotype")%in% colnames(prot_data_plot))){
diff --git a/panpipes/panpipes/pipeline_integration/pipeline.yml b/panpipes/panpipes/pipeline_integration/pipeline.yml
index a51b5728..cebaff87 100644
--- a/panpipes/panpipes/pipeline_integration/pipeline.yml
+++ b/panpipes/panpipes/pipeline_integration/pipeline.yml
@@ -223,9 +223,9 @@ multimodal:
     modalities: rna,prot
     exclude_mt_genes: True
     mt_column: mt
-    # to filter outliers manually create a column called adt_outliers in mdata['prot'].obs
+    # to filter outliers manually create a column called prot_outliers in mdata['prot'].obs
     filter_by_hvg: True
-    filter_adt_outliers: False
+    filter_prot_outliers: False
     model_args: 
       latent_distribution: "normal"
     training_args:
diff --git a/panpipes/panpipes/pipeline_preprocess/pipeline.yml b/panpipes/panpipes/pipeline_preprocess/pipeline.yml
index db229e66..e36a9f3e 100644
--- a/panpipes/panpipes/pipeline_preprocess/pipeline.yml
+++ b/panpipes/panpipes/pipeline_preprocess/pipeline.yml
@@ -167,7 +167,7 @@ plotqc:
   grouping_var: sample_id
   # specify metrics in the metadata to plot:
   rna_metrics: pct_counts_mt,pct_counts_rp,pct_counts_hb,pct_counts_ig,doublet_scores
-  prot_metrics: total_counts,log1p_total_counts,n_adt_by_counts,pct_counts_isotype
+  prot_metrics: total_counts,log1p_total_counts,n_prot_by_counts,pct_counts_isotype
   atac_metrics: 
   rep_metrics:
  # spatialT_metrics: total_counts
@@ -238,7 +238,7 @@ prot:
 
   # CLR parameters:
   # margin determines whether you normalise per cell (as you would RNA norm), 
-  # or by feature (recommended, due to the variable nature of adts). 
+  # or by feature (recommended, due to the variable nature of prot assays). 
   # CLR margin 0 is recommended for informative qc plots in this pipeline
   # 0 = normalise colwise (per feature)
   # 1 = normalise rowwise (per cell)
diff --git a/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py b/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py
index 47530a0a..fc297932 100644
--- a/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py
+++ b/panpipes/python_scripts/aggregate_cellranger_summary_metrics.py
@@ -94,7 +94,7 @@ def get_all_unique_paths(pipe_df):
     all_paths = all_paths.drop_duplicates()
     # rename path_type to match cellranger terminolgy
     recode_dict = {'rna_path': "Gene Expression", 
-                   'adt_path': "Antibody Capture", 
+                   'prot_path': "Antibody Capture", 
                    'tcr_path': 'VDJ T', 
                    'bcr_path': 'VDJ B'}
     all_paths['path_type'] = all_paths['path_type'].replace(recode_dict)
diff --git a/panpipes/python_scripts/batch_correct_totalvi.py b/panpipes/python_scripts/batch_correct_totalvi.py
index 8fde16fb..b7214484 100644
--- a/panpipes/python_scripts/batch_correct_totalvi.py
+++ b/panpipes/python_scripts/batch_correct_totalvi.py
@@ -123,14 +123,14 @@
     sc_raw = sc_raw[sc_raw.obs_names.isin(rna.obs_names),: ]
     rna.layers["counts"] = sc_raw.X.copy()
 
-# filter adt outliers 
-if params['multimodal']['totalvi']['filter_adt_outliers']:
-    # for this to work the user needs to (manually) make a column called adt_outliers
+# filter prot outliers 
+if params['multimodal']['totalvi']['filter_prot_outliers']:
+    # for this to work the user needs to (manually) make a column called prot_outliers
     # actually there is a thing in the qc pipe that calculates outliers, I don't like it very much though
-    if "adt_outliers" in mdata['prot'].columns:
-        mu.pp.filter_obs(mdata, "adt_outliers")
+    if "prot_outliers" in mdata['prot'].columns:
+        mu.pp.filter_obs(mdata, "prot_outliers")
     else:
-        raise ValueError("adt_outliers column not found in mdata['prot'].obs")
+        raise ValueError("prot_outliers column not found in mdata['prot'].obs")
 
 # exluding isotypes
 if 'isotype' in prot.var.columns:
diff --git a/panpipes/python_scripts/run_preprocess_prot.py b/panpipes/python_scripts/run_preprocess_prot.py
index 7563869f..f06665c9 100644
--- a/panpipes/python_scripts/run_preprocess_prot.py
+++ b/panpipes/python_scripts/run_preprocess_prot.py
@@ -44,7 +44,7 @@
                     default=None,
                     help="")   
 parser.add_argument("--figpath",
-                    default="./adt_figures",
+                    default="./prot_figures",
                     help="")  
 parser.add_argument("--save_mtx",
                     default=False,
@@ -55,7 +55,9 @@
               
 
 args, opt = parser.parse_known_args()
-# args = argparse.Namespace(filtered_mudata='test.h5mu', bg_mudata='/well/cartography/users/zsj686/non_cart_projects/005-multimodal_scpipelines/ingest/test_raw.h5mu', channel_col=None, normalisation_methods='clr,dsb', clr_margin='0', quantile_clipping='True', figpath='./figures/adt', save_mtx=False, save_mudata_path='test.h5mu')
+# args = argparse.Namespace(filtered_mudata='test.h5mu', 
+# bg_mudata='/well/cartography/users/zsj686/non_cart_projects/005-multimodal_scpipelines/ingest/test_raw.h5mu', 
+# channel_col=None, normalisation_methods='clr,dsb', clr_margin='0', quantile_clipping='True', figpath='./figures/prot', save_mtx=False, save_mudata_path='test.h5mu')
 save_mtx=pnp.pp.check_for_bool(args.save_mtx)
 
 norm_methods = args.normalisation_methods.split(',')
@@ -137,7 +139,7 @@
         if 'clr' in norm_methods:
             # make sure to start from raw counts
             mdata["prot"].X = mdata["prot"].layers["raw_counts"].copy()
-            pnp.scmethods.run_adt_normalise(mdata=mdata, mdata_bg=None,
+            pnp.scmethods.run_prot_normalise(mdata=mdata, mdata_bg=None,
                     method="clr",
                     clr_margin=int(args.clr_margin))
             L.info("saving ridgeplot")
@@ -148,7 +150,7 @@
                 plt.savefig(os.path.join(args.figpath, str(si) + "_clr_ridgeplot_isotypes.png"))
             # save out the data in what format?
             if save_mtx:
-                pnp.io.write_10x_counts(mdata["prot"], os.path.join("adt_clr" , str(si) ), layer="raw_counts")
+                pnp.io.write_10x_counts(mdata["prot"], os.path.join("prot_clr" , str(si) ), layer="raw_counts")
         # then run dsb
         if 'dsb' in norm_methods:
             mdata_bg = mdata_bg_per_sample[si]
@@ -161,7 +163,7 @@
             # plt.savefig(os.path.join(args.figpath, si + "_log10umi.png"))
             # run dsb
             # this stores a layer named after the method as well as overwriting X
-            pnp.scmethods.run_adt_normalise(mdata=mdata, 
+            pnp.scmethods.run_prot_normalise(mdata=mdata, 
                                             mdata_bg=mdata_bg,
                                             method="dsb", 
                                             isotypes=isotypes)
@@ -171,7 +173,7 @@
                 pnp.plotting.ridgeplot(mdata["prot"], features=isotypes, layer="dsb",  splitplot=1)
                 plt.savefig(os.path.join(args.figpath, str(si) + "_dsb_ridgeplot_isotypes.png"))
             if save_mtx:
-                pnp.io.write_10x_counts(mdata["prot"], os.path.join("adt_dsb" , str(si)), layer="raw_counts")
+                pnp.io.write_10x_counts(mdata["prot"], os.path.join("prot_dsb" , str(si)), layer="raw_counts")
 else:
     # run on all the data (not on a channel basis)
     # first run clr
@@ -184,7 +186,7 @@
         all_mdata["prot"].X = all_mdata["prot"].layers["raw_counts"].copy()
         # run normalise
         # this stores a layer named after the method as well as overwriting X
-        pnp.scmethods.run_adt_normalise(mdata=all_mdata, 
+        pnp.scmethods.run_prot_normalise(mdata=all_mdata, 
                 mdata_bg=None,
                 method="clr",
                 clr_margin=int(args.clr_margin))
@@ -195,7 +197,7 @@
             plt.savefig(os.path.join(args.figpath, "clr_ridgeplot_isotypes.png"))
         # save out the data in what format?
         if save_mtx:
-            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("adt_clr"), layer="clr")
+            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("prot_clr"), layer="clr")
     if 'dsb' in norm_methods:
         # make sure to start from raw counts
         all_mdata["prot"].X = all_mdata["prot"].layers["raw_counts"].copy()
@@ -205,7 +207,7 @@
         mu.pl.histogram(all_mdata_bg["rna"], ["log10umi"], bins=50)
         plt.savefig(os.path.join(args.figpath, "all_log10umi.png"))
         # this stores a layer named after the method as well as overwriting X
-        pnp.scmethods.run_adt_normalise(mdata=all_mdata, 
+        pnp.scmethods.run_prot_normalise(mdata=all_mdata, 
                 mdata_bg=all_mdata_bg, 
                 method="dsb",
                 isotypes=isotypes) 
@@ -221,7 +223,7 @@
             plt.savefig(os.path.join(args.figpath, "dsb_ridgeplot_isotypes.png"))
         # save out the data in what format?
         if save_mtx:
-            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("adt_dsb"), layer="dsb")
+            pnp.io.write_10x_counts(all_mdata["prot"], os.path.join("prot_dsb"), layer="dsb")
     
     if args.store_as_x is not None:
         all_mdata["prot"].X = all_mdata["prot"].layers[args.store_as_x]
diff --git a/panpipes/python_scripts/run_scanpyQC_prot.py b/panpipes/python_scripts/run_scanpyQC_prot.py
index abcaac84..86630a2c 100644
--- a/panpipes/python_scripts/run_scanpyQC_prot.py
+++ b/panpipes/python_scripts/run_scanpyQC_prot.py
@@ -64,7 +64,7 @@
 parser.add_argument("--per_cell_metrics",
                     default=None,
                     help="")
-parser.add_argument("--per_adt_metrics",
+parser.add_argument("--per_prot_metrics",
                     default=None,
                     help="")
 
@@ -118,7 +118,7 @@
 # if isotypes are found then include them in the caluclate qc metrics call.
 # this calculates standard metrics 
 sc.pp.calculate_qc_metrics(prot,
-                        var_type="adt",  
+                        var_type="prot",  
                         qc_vars=qc_vars, 
                         percent_top=None,log1p=True, inplace=True)
 
@@ -161,7 +161,7 @@
 for si in prot.obs[channel_col].unique():
     L.info(si)
     obs_df, var_df = sc.pp.calculate_qc_metrics(prot[prot.obs[channel_col]==si],
-                            var_type="adt", 
+                            var_type="prot", 
                             qc_vars=qc_vars, 
                             percent_top=None,log1p=True, inplace=False)
     # store the var df for the metrics we care about in the dictionary.
@@ -170,24 +170,24 @@
     out[si][channel_col] = si
 
 # concatenate all the var_dfs, so we can summarise the per channel data in boxplots.
-var_dat = pd.concat(out).reset_index().rename(columns={'level_0':channel_col, 'level_1': "adt_id"})
-adt_id_col=var_dat.columns[1]
+var_dat = pd.concat(out).reset_index().rename(columns={'level_0':channel_col, 'level_1': "prot_id"})
+prot_id_col=var_dat.columns[1]
 
-if args.per_adt_metrics is not None:
-    per_adt_metrics = args.per_adt_metrics.split(",")
-    per_adt_metrics = [a.strip() for a in per_adt_metrics]
+if args.per_prot_metrics is not None:
+    per_prot_metrics = args.per_prot_metrics.split(",")
+    per_prot_metrics = [a.strip() for a in per_prot_metrics]
 
     # let's make some boxplot figures.
-    for qcp in per_adt_metrics:
+    for qcp in per_prot_metrics:
         fig, ax = plt.subplots(nrows=1,figsize=(24,6))
-        sns.boxplot(data=var_dat, x=adt_id_col, y=qcp, ax=ax)
+        sns.boxplot(data=var_dat, x=prot_id_col, y=qcp, ax=ax)
         adjust_x_axis(ax)
         fig.tight_layout()
         plt.savefig(os.path.join(figdir, "boxplot_" + channel_col + "_" + qcp + ".png"))
     plt.close()
 
     # write out the per channel metrics in a separate csv.
-    var_dat.to_csv(args.sampleprefix + "_adt_qc_metrics_per_" + channel_col + ".csv")
+    var_dat.to_csv(args.sampleprefix + "_prot_qc_metrics_per_" + channel_col + ".csv")
 
 L.info("Done")
 
diff --git a/panpipes/resources/sample_file_qc_mm.txt b/panpipes/resources/sample_file_qc_mm.txt
index 7e45675e..77775ce8 100644
--- a/panpipes/resources/sample_file_qc_mm.txt
+++ b/panpipes/resources/sample_file_qc_mm.txt
@@ -1,3 +1,3 @@
-sample_id	rna_path	rna_filetype	adt_path	adt_filetype	tissue	diagnosis
+sample_id	rna_path	rna_filetype	prot_path	prot_filetype	tissue	diagnosis
 Sample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthy
 Sample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file
diff --git a/panpipes/resources/sample_file_qc_mm_gex_only.txt b/panpipes/resources/sample_file_qc_mm_gex_only.txt
index 7e45675e..77775ce8 100644
--- a/panpipes/resources/sample_file_qc_mm_gex_only.txt
+++ b/panpipes/resources/sample_file_qc_mm_gex_only.txt
@@ -1,3 +1,3 @@
-sample_id	rna_path	rna_filetype	adt_path	adt_filetype	tissue	diagnosis
+sample_id	rna_path	rna_filetype	prot_path	prot_filetype	tissue	diagnosis
 Sample1	Sample1_gex.csv	csv_matrix	Sample1_adt.csv	csv_matrix	pbmc	healthy
 Sample2	cellranger_count/Sample2_GEX/outs/	cellranger	cellranger_count/Sample2_CITE/outs/	cellranger	pbmc	diseased
\ No newline at end of file
diff --git a/tests/sample_caf.tsv b/tests/sample_caf.tsv
index cdcd773b..d60886ed 100644
--- a/tests/sample_caf.tsv
+++ b/tests/sample_caf.tsv
@@ -1,3 +1,3 @@
-sample_id	rna_path	rna_filetype	adt_path	adt_filetype
+sample_id	rna_path	rna_filetype	prot_path	prot_filetype
 sample1	../sample_1_rna	cellranger	../sample_1_prot	cellranger
 sample2	../sample_2_rna	cellranger	../sample_2_prot	cellranger

From 0b790ae0a12f610bae000bfaa9884b10e16f3bc1 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Mon, 25 Sep 2023 12:35:57 +0200
Subject: [PATCH 17/28] typos

---
 panpipes/panpipes/pipeline_ingest.py         | 4 ++--
 panpipes/python_scripts/assess_background.py | 2 +-
 panpipes/python_scripts/concat_adata.py      | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/panpipes/panpipes/pipeline_ingest.py b/panpipes/panpipes/pipeline_ingest.py
index 430c5b42..acda4dbc 100644
--- a/panpipes/panpipes/pipeline_ingest.py
+++ b/panpipes/panpipes/pipeline_ingest.py
@@ -427,8 +427,8 @@ def run_scanpy_prot_qc(log_file, outfile, unfilt_file):
         cmd += " --channel_col sample_id"
     if PARAMS['prot_plotqc_metrics']:
         cmd += " --per_cell_metrics %(prot_plotqc_metrics)s"
-    if PARAMS['prot_metrics_per_prot']:
-        cmd += " --per_prot_metrics %(prot_metrics_per_prot)s"
+    if PARAMS['plot_metrics_per_prot']:
+        cmd += " --per_prot_metrics %(plot_metrics_per_prot)s"
     if PARAMS['identify_isotype_outliers']:
         cmd += " --identify_isotype_outliers %(identify_isotype_outliers)s"
         cmd += " --isotype_upper_quantile %(isotype_upper_quantile)s"
diff --git a/panpipes/python_scripts/assess_background.py b/panpipes/python_scripts/assess_background.py
index 086a8580..634278f6 100644
--- a/panpipes/python_scripts/assess_background.py
+++ b/panpipes/python_scripts/assess_background.py
@@ -147,7 +147,7 @@
 # quantifying the top background features
 ## Repeat for protein (if it exists)
 if 'prot' in mdata.mod.keys():
-    # this time we'll just use all the adts.
+    # this time we'll just use all the prot vars.
     top_genes = list(mdata_bg['prot'].var_names)
     bg_df = pnp.scmethods.get_mean_background_fraction(mdata_bg['prot'], top_background=top_genes, group_by=args.channel_col)
     if bg_df.shape[0] >1:
diff --git a/panpipes/python_scripts/concat_adata.py b/panpipes/python_scripts/concat_adata.py
index 02a0a468..6344b8eb 100644
--- a/panpipes/python_scripts/concat_adata.py
+++ b/panpipes/python_scripts/concat_adata.py
@@ -178,7 +178,7 @@
 
 L.debug(mdata.obs.columns)
 L.debug(mdata.obs.head())
-# update the protein variable to add in extra info like isotype and alternate name for adt
+# update the protein variable to add in extra info like isotype and alternate name for prot
 if args.protein_var_table is not None:
     try:
         df = pd.read_csv(args.protein_var_table, sep='\t', index_col=0)

From d153b4e292a0b1fee126505d3f11898b68b1922d Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 29 Sep 2023 11:50:05 +0200
Subject: [PATCH 18/28] fixes to plotting params in yml and ingest

---
 panpipes/panpipes/pipeline_ingest.py          |  2 +-
 .../panpipes/pipeline_ingest/pipeline.yml     | 22 +++++++------------
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/panpipes/panpipes/pipeline_ingest.py b/panpipes/panpipes/pipeline_ingest.py
index acda4dbc..55693c12 100644
--- a/panpipes/panpipes/pipeline_ingest.py
+++ b/panpipes/panpipes/pipeline_ingest.py
@@ -426,7 +426,7 @@ def run_scanpy_prot_qc(log_file, outfile, unfilt_file):
     else:
         cmd += " --channel_col sample_id"
     if PARAMS['prot_plotqc_metrics']:
-        cmd += " --per_cell_metrics %(prot_plotqc_metrics)s"
+        cmd += " --per_cell_metrics %(plotqc_prot_metrics)s"
     if PARAMS['plot_metrics_per_prot']:
         cmd += " --per_prot_metrics %(plot_metrics_per_prot)s"
     if PARAMS['identify_isotype_outliers']:
diff --git a/panpipes/panpipes/pipeline_ingest/pipeline.yml b/panpipes/panpipes/pipeline_ingest/pipeline.yml
index e9ce97d7..a6fc1242 100644
--- a/panpipes/panpipes/pipeline_ingest/pipeline.yml
+++ b/panpipes/panpipes/pipeline_ingest/pipeline.yml
@@ -181,7 +181,7 @@ custom_genes_file: resources/qc_genelist_1.0.csv
 calc_proportions: hb,mt,rp
 score_genes: MarkersNeutro
 
-
+# ------------
 # Plot RNA QC metrics
 # ------------
 # all metrics should be inputted as a comma separated string e.g. a,b,c
@@ -191,13 +191,12 @@ plotqc_grouping_var: sample_id
 plotqc_rna_metrics: doublet_scores,pct_counts_mt,pct_counts_rp,pct_counts_hb,pct_counts_ig
 
 
-
 # ------------
-# ADT qc - requires prot_path to be included in the submission file
+# Plot PROT QC metrics 
 # ------------
+# requires prot_path to be included in the submission file
 # all metrics should be inputted as a comma separated string e.g. a,b,c
 
-###  ADT QC metrics:
 # as standard the following metrics are calculated for prot data
 # per cell metrics:
 # total_counts,log1p_total_counts,n_prot_by_counts,log1p_n_prot_by_counts
@@ -205,10 +204,10 @@ plotqc_rna_metrics: doublet_scores,pct_counts_mt,pct_counts_rp,pct_counts_hb,pct
 # total_counts_isotype,pct_counts_isotype
 # choose which ones you want to plot here
 plotqc_prot_metrics: total_counts,log1p_total_counts,n_prot_by_counts,pct_counts_isotype
-
-# per antibody metrics
+# Since the protein antibody panels usually count fewer features than the RNA, it may be interesting to
+# visualize breakdowns of single proteins plots describing their count distribution and other qc options.
+# choose which ones you want to plot here, for example
 # n_cells_by_counts,mean_counts,log1p_mean_counts,pct_dropout_by_counts,total_counts,log1p_total_counts
-# choose which ones you want to plot here:
 plot_metrics_per_prot: total_counts,log1p_total_counts,n_cells_by_counts,mean_counts
 
 # isotype outliers: one way to determine which cells are very sticky is to work out which cells have the most isotype UMIs
@@ -218,7 +217,6 @@ plot_metrics_per_prot: total_counts,log1p_total_counts,n_cells_by_counts,mean_co
 identify_isotype_outliers: True
 isotype_upper_quantile: 90
 isotype_n_pass: 2
-# TODO: work out how to plot this.
 
 ### Investigate per channel antibody staining:
 # This can help determine any inconsistencies in staining per channel and other QC concerns.
@@ -263,11 +261,8 @@ quantile_clipping: True
 # from both rna and protein assays, 
 # see details for how to make sure your files are compatible in the assess background section below
 
-
-
-
 # ------------
-# ATAC qc 
+# Plot ATAC QC metrics 
 # ------------
 
 # we require initializing one csv file per aggregated ATAC/multiome experiment.
@@ -287,9 +282,8 @@ features_tss: #resources/features_tss_hg19.tsv
 # all metrics should be inputted as a comma separated string e.g. a,b,c
 plotqc_atac_metrics: n_genes_by_counts,total_counts,pct_fragments_in_peaks,atac_peak_region_fragments,atac_mitochondrial_reads,atac_TSS_fragments
 
-
 # ------------
-# Repertoire QC
+# Plot Repertoire QC metrics
 # ------------
 # Repertoire data will be stored in one modality called "rep", if you provide both TCR and BCR data then this will be merged, 
 # but various functions will be fun on TCR and BCR separately

From 327eb723e0b9c0966b47a6b06ada13fcc702c2ee Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 29 Sep 2023 11:58:33 +0200
Subject: [PATCH 19/28] fixes to names

---
 panpipes/panpipes/pipeline_ingest.py           | 2 +-
 panpipes/panpipes/pipeline_ingest/pipeline.yml | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/panpipes/panpipes/pipeline_ingest.py b/panpipes/panpipes/pipeline_ingest.py
index 55693c12..f1d1b89e 100644
--- a/panpipes/panpipes/pipeline_ingest.py
+++ b/panpipes/panpipes/pipeline_ingest.py
@@ -425,7 +425,7 @@ def run_scanpy_prot_qc(log_file, outfile, unfilt_file):
         cmd += " --channel_col %(channel_col)s"
     else:
         cmd += " --channel_col sample_id"
-    if PARAMS['prot_plotqc_metrics']:
+    if PARAMS['plotqc_prot_metrics']:
         cmd += " --per_cell_metrics %(plotqc_prot_metrics)s"
     if PARAMS['plot_metrics_per_prot']:
         cmd += " --per_prot_metrics %(plot_metrics_per_prot)s"
diff --git a/panpipes/panpipes/pipeline_ingest/pipeline.yml b/panpipes/panpipes/pipeline_ingest/pipeline.yml
index a6fc1242..e3e0366f 100644
--- a/panpipes/panpipes/pipeline_ingest/pipeline.yml
+++ b/panpipes/panpipes/pipeline_ingest/pipeline.yml
@@ -120,7 +120,7 @@ subset_prot_barcodes_to_rna: False
 plot_10X_metrics: True 
 kneeplot: False
 # ------------
-## Doublets on GEX - Scrublet
+## Doublets on RNA - Scrublet
 # ------------
 scr:
   run: True
@@ -155,7 +155,7 @@ scr:
   #if use_thr is True, this thr will be used to define doublets
 
 # ------------
-# GEX QC
+# RNA QC
 # ------------
 # this part of the pipeline allows to generate the QC parameters that will be used to 
 # evaluate inclusion/ exclusion criteria. Filtering of cells/genes happens in the following pipeline

From 556e66404a8a60720bcecd6c78a94ae531ca13ec Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 29 Sep 2023 12:15:11 +0200
Subject: [PATCH 20/28] plotting fixes

---
 panpipes/R_scripts/plotQC.R | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/panpipes/R_scripts/plotQC.R b/panpipes/R_scripts/plotQC.R
index 58d61665..c8b1e7b0 100644
--- a/panpipes/R_scripts/plotQC.R
+++ b/panpipes/R_scripts/plotQC.R
@@ -17,7 +17,8 @@ do_scatter_plot <- function(df, x, y, facet=NULL, hue=NULL){
   }
   g <- g+ theme_bw()+
     theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5),
-          strip.text.x = element_text(size=6))
+          strip.text.x = element_text(size=6),
+          legend.key.size = unit(0.2, 'cm')) 
   return(g)
 }
 
@@ -184,25 +185,25 @@ for (sc in rna_source_facet){
   if (all(c("total_counts","n_genes_by_counts")%in% colnames(rna_data_plot))){
     g <- do_scatter_plot(rna_data_plot,x="total_counts",y="n_genes_by_counts", facet=sc)
     ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_rna-nUMI_v_rna-genes.png")), type="cairo",
-           width= 2*ncols, height=2*nrows, dpi=120)
+           width= 3*ncols, height=3*nrows, dpi=200)
   }
   
   if (all(c("log1p_total_counts","log1p_n_genes_by_counts")%in% colnames(rna_data_plot))){
     
     g <- do_scatter_plot(rna_data_plot,x="log1p_total_counts",y="log1p_n_genes_by_counts", facet=sc)
     ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_rna-log1p_nUMI_v_rna-log1p_genes.png")), type="cairo",
-           width= 2*ncols, height=2*nrows, dpi=120)
+           width= 3*ncols, height=3*nrows, dpi=200)
   }
   if (all(c("total_counts","pct_counts_mt")%in% colnames(rna_data_plot))){
     g <- do_scatter_plot(rna_data_plot,x="total_counts",y="pct_counts_mt", facet=sc)
     ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_rna-nUMI_v_rna-pct_mt.png")), type="cairo",
-           width= 2*ncols, height=2*nrows, dpi=120)
+           width= 3*ncols, height=3*nrows, dpi=200)
   }
   if (all(c("n_genes_by_counts","doublet_scores","total_counts")%in% colnames(rna_data_plot))){
     g <- do_scatter_plot(rna_data_plot,x="n_genes_by_counts",y="doublet_scores",
                          hue="total_counts", facet=sc)
     ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_rna-genes_rna-doublet_scores_rna-numi.png")),    
-           type="cairo", width= 2*ncols, height=2*nrows, dpi=120)
+           type="cairo", width= 3*ncols, height=3*nrows, dpi=200)
   }
   
 }
@@ -258,27 +259,27 @@ if(!is.null(opt$prot_qc_metrics)){
     if (all(c("total_counts","n_prot_by_counts")%in% colnames(prot_data_plot))){
       g <- do_scatter_plot(prot_data_plot,x="total_counts",y="n_prot_by_counts", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-nUMI_v_prot-prot.png")), type="cairo",
-             width= 2*ncols, height=2*nrows, dpi=120)
+             width= 3*ncols, height=3*nrows, dpi=200)
     }
     if (all(c("log1p_total_counts","log1p_n_prot_by_counts")%in% colnames(prot_data_plot))){
       g <- do_scatter_plot(prot_data_plot,x="log1p_total_counts",y="log1p_n_prot_by_counts", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-log1p_nUMI_v_prot-log1p_prot.png")), type="cairo",
-             width= 2*ncols, height=2*nrows, dpi=120)
+             width= 3*ncols, height=3*nrows, dpi=200)
     }
     if (all(c("total_counts","pct_counts_isotype")%in% colnames(prot_data_plot))){
       g <- do_scatter_plot(prot_data_plot,x="total_counts",y="pct_counts_isotype", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-nUMI_v_prot-pct_isotype.png")), type="cairo",
-             width= 2*ncols, height=2*nrows, dpi=120)
+             width= 3*ncols, height=3*nrows, dpi=200)
     }
     if (all(c("total_counts","total_counts_isotype")%in% colnames(prot_data_plot))){
       g <- do_scatter_plot(prot_data_plot,x="total_counts",y="total_counts_isotype", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-nUMI_v_prot-total_counts_isotype.png")), type="cairo",
-             width= 2*ncols, height=2*nrows, dpi=120)
+             width= 3*ncols, height=3*nrows, dpi=200)
     }
     if (all(c("log1p_total_counts","log1p_total_counts_isotype", "isotype_exclude")%in% colnames(prot_data_plot))){
       g <- do_scatter_plot(prot_data_plot,x="log1p_total_counts",y="log1p_total_counts_isotype", hue="isotype_exclude", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "_prot-log1p_nUMI_v_prot-log1p_total_counts_isotype.png")), type="cairo",
-             width= 2*ncols, height=2*nrows, dpi=120)
+             width= 3*ncols, height=3*nrows, dpi=200)
     }
   }
 
@@ -376,27 +377,27 @@ if(!is.null(opt$prot_qc_metrics)){
     if (all(c("rna.total_counts","prot.total_counts")%in% colnames(data_plot))){
       g <- do_scatter_plot(data_plot,x="rna.total_counts",y="prot.total_counts", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "-nUMI_v_rna-nUMI.png")), type="cairo",
-              width= 2*ncols, height=2*nrows, dpi=120)
+              width= 3*ncols, height=3*nrows, dpi=200)
     }
     if (all(c("rna.log1p_total_counts","prot.log1p_total_counts")%in% colnames(data_plot))){
       g <- do_scatter_plot(data_plot,x="rna.log1p_total_counts",y="prot.log1p_total_counts", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "-log1p_nUMI_v_rna-log1p_nUMI.png")), type="cairo",
-              width= 2*ncols, height=2*nrows, dpi=120)
+              width= 3*ncols, height=3*nrows, dpi=200)
     }
     if (all(c("rna.total_counts","prot.total_counts_isotype")%in% colnames(data_plot))){
       g <-  do_scatter_plot(data_plot,x="rna.total_counts",y="prot.total_counts_isotype", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "-nUMI_v_prot-counts_isotype.png")), type="cairo",
-              width= 2*ncols, height=2*nrows, dpi=120)
+              width= 3*ncols, height=3*nrows, dpi=200)
     }    
     if (all(c("rna.log1p_total_counts","prot.log1p_total_counts_isotype") %in% colnames(data_plot))){
       g <-  do_scatter_plot(data_plot,x="rna.log1p_total_counts",y="prot.log1p_total_counts_isotype", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "-log1p_nUMI_v_prot-log1p_counts_isotype.png")), type="cairo",
-              width= 2*ncols, height=2*nrows, dpi=120)
+              width= 3*ncols, height=3*nrows, dpi=200)
     }
     if (all(c("rna.doublet_scores","prot.log1p_total_counts") %in% colnames(data_plot))){
       g <-  do_scatter_plot(data_plot,x="rna.doublet_scores",y="prot.log1p_total_counts", facet=sc)
       ggsave(g, filename=file.path(outpath, paste0("scatter_", sc, "-doublet_scores_v_prot-log1p_nUMI.png")), type="cairo",
-              width= 2*ncols, height=2*nrows, dpi=120)
+              width= 3*ncols, height=3*nrows, dpi=200)
       
     }
     

From b10fe3b5e8c776f005263a003a43e792441ee62a Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 29 Sep 2023 12:23:53 +0200
Subject: [PATCH 21/28] fix to names

---
 .../pipeline_preprocess_preprint.yml          | 10 ++++-----
 .../R_scripts/produce_barplot_10xmetric.v3.R  |  4 ++--
 panpipes/funcs/io.py                          |  4 ++--
 .../panpipes/pipeline_ingest/pipeline.yml     |  2 +-
 .../panpipes/pipeline_preprocess/pipeline.yml | 20 +++++++-----------
 .../panpipes/pipeline_refmap/pipeline.yml     | 21 -------------------
 panpipes/python_scripts/assess_background.py  |  4 ++--
 panpipes/python_scripts/run_scanpyQC_prot.py  |  8 +++----
 panpipes/python_scripts/run_scanpyQC_rep.py   |  6 +++---
 panpipes/python_scripts/run_scanpyQC_rna.py   |  2 +-
 10 files changed, 28 insertions(+), 53 deletions(-)

diff --git a/docs/workflows/pipeline_preprocess_preprint.yml b/docs/workflows/pipeline_preprocess_preprint.yml
index 8b1488da..c172b00e 100644
--- a/docs/workflows/pipeline_preprocess_preprint.yml
+++ b/docs/workflows/pipeline_preprocess_preprint.yml
@@ -161,7 +161,7 @@ plotqc:
   atac_metrics: total_counts
   rep_metrics: 
 
-
+# --------------------------------------------------------------------------------------------------------
 # RNA Normalisation
 # --------------------------------------------------------------------------------------------------------
 # hvg_flavour options include "seurat", "cell_ranger", "seurat_v3", default; "seurat"
@@ -213,8 +213,8 @@ pca:
   scree_n_pcs: 50
   color_by: sample_id
 
-
-# Protein (ADT) normalisation
+# --------------------------------------------------------------------------------------------------------
+# Protein (PROT) normalisation
 # --------------------------------------------------------------------------------------------------------
 prot:
   # comma separated string of normalisation options
@@ -250,9 +250,9 @@ prot:
 
   # do you want to save the prot normalised assay additionally as a txt file:
   save_norm_prot_mtx: False
-
+# --------------------------------------------------------------------------------------------------------
 # ATAC preprocessing and normalisation
-# --------------------------
+# --------------------------------------------------------------------------------------------------------
 atac:
   binarize: False
   normalize: TFIDF
diff --git a/panpipes/R_scripts/produce_barplot_10xmetric.v3.R b/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
index 59cffae1..0a86fc3c 100644
--- a/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
+++ b/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
@@ -146,7 +146,7 @@ if(ncol(tag.mat)>0){
 }
 
 if(nrow(mat)==0){mat <- tag.mat}
-#modify this to deal with AB reads and rna reads, need to split the knee plotting in 2 if you want both rna and ADT knees in 2 different graphs
+#modify this to deal with AB reads and rna reads, need to split the knee plotting in 2 if you want both rna and PROT knees in 2 different graphs
 if(opt$kneeplot){
   print("plotting kneeplot")
   gsub("metrics_summary.csv","raw_feature_bc_matrix" , paths) ->normpath
@@ -205,6 +205,6 @@ if(opt$kneeplot){
 
 }
 
-# if ADT exists then repeat for ADT
+# if PROT exists then repeat for PROT
 
 message("finished pipeline")
diff --git a/panpipes/funcs/io.py b/panpipes/funcs/io.py
index 7db749b4..6d399925 100644
--- a/panpipes/funcs/io.py
+++ b/panpipes/funcs/io.py
@@ -482,10 +482,10 @@ def load_mdata_from_multiple_files(all_files_dict):
     Parameters
     ----------
     all_files_dict: dict
-        dictionary containing one key per assay out of [RNA, ADT, TCR, BCR, ATAC]
+        dictionary containing one key per assay out of [RNA, PROT, TCR, BCR, ATAC]
         and the values for each key is a list of file path and fie type
         e.g.  {"RNA": [filepath, "filetype"],
-               "ADT: [file path, "filetype"]} 
+               "PROT: [file path, "filetype"]} 
         Filetypes supported for RNA/prot: ["cellranger", "h5ad", "csv_matrix", "txt_matrix", "10X_h5"],
         Filetypes supported for atac (multiome preferred is 10X_h5) ["10X_h5","cellranger","h5ad"]
         Filetypes supported for rep: ["cellranger_vdj", "airr", "tracer", "bracer"  ] 
diff --git a/panpipes/panpipes/pipeline_ingest/pipeline.yml b/panpipes/panpipes/pipeline_ingest/pipeline.yml
index e3e0366f..581d54e8 100644
--- a/panpipes/panpipes/pipeline_ingest/pipeline.yml
+++ b/panpipes/panpipes/pipeline_ingest/pipeline.yml
@@ -229,7 +229,7 @@ channel_col: sample_id
 # It is important to note that in ingest the per channel normalised prot scores are not saved in the mudata object
 #  this is because if you perform feature normalisation (clr normalisation margin 0 or dsb normalisation), 
 #  on subsets of cells then the normalised values cannot be simply concatenated. 
-# The ADT normalisation is rerun pn the complete object in the preprocess pipeline 
+# The PROT normalisation is rerun pn the complete object in the preprocess pipeline 
 # (or you can run this pipeline with channel_col set as None)
 # it is important to note, if you choose to run the clr on a per channel basis, then it is not stored in the h5mu file.
 # if you want to save the per channel normalised values set the following to True:
diff --git a/panpipes/panpipes/pipeline_preprocess/pipeline.yml b/panpipes/panpipes/pipeline_preprocess/pipeline.yml
index e36a9f3e..7f71f9fe 100644
--- a/panpipes/panpipes/pipeline_preprocess/pipeline.yml
+++ b/panpipes/panpipes/pipeline_preprocess/pipeline.yml
@@ -16,13 +16,10 @@ condaenv:
 
 # allows for tweaking where the jobs get submitted, in case there is a special queue for long jobs or you have access to a gpu
 # leave as is if you do not want to use the alternative queues
-# also rescomp users need to request gpu access.
 
+# --------------------------
 # Start
 # --------------------------
-# either one that exists already with
-
-
 sample_prefix: 
 unfiltered_obj: 
 # if running this on prefiltered datat then
@@ -31,8 +28,6 @@ unfiltered_obj:
 #3. put renamed file in the same folder as this yml.
 #4. set filtering run: to False below.
 
-#TO DO this needs to go i think it doesn't make sense to save this file.
-
 # leave empty if you don't want to save an intermediate muData with normalized expression 
 # stored in X for the RNA object
 output_logged_mudata:
@@ -43,13 +38,14 @@ modalities:
   rep: False
   atac: False
   #spatialT: True
-
+# --------------------------------------------------------------------------------------------------------
 # Filtering
 # --------------------------
 # the filtering process in panpipes is sequential as it goes through the filtering dictionary.
 # for each modality, starting with rna, it will first filter on obs and then vars.
 # each modality has a dictionary in the following format. This is fully customisable to any
 # columns in the mudata.obs or var object.
+# To avoid being prescriptive with the covariates names, we allow to provide custom names.
 # When specifying a column name, make sure it exactly matches the column name in the h5mu object.
 # rna:
   # obs:
@@ -174,7 +170,7 @@ plotqc:
 
 
-
+# --------------------------------------------------------------------------------------------------------
 # RNA Normalisation
 # --------------------------------------------------------------------------------------------------------
 # hvg_flavour options include "seurat", "cell_ranger", "seurat_v3", default; "seurat"
@@ -222,8 +218,8 @@ pca:
   scree_n_pcs: 50
   color_by: sample_id
 
-
-# Protein (ADT) normalisation
+# --------------------------------------------------------------------------------------------------------
+# Protein (PROT) normalisation
 # --------------------------------------------------------------------------------------------------------
 prot:
   # comma separated string of normalisation options
@@ -259,9 +255,9 @@ prot:
 
   # do you want to save the prot normalised assay additionally as a txt file:
   save_norm_prot_mtx: False
-
+# --------------------------------------------------------------------------------------------------------
 # ATAC preprocessing and normalisation
-# --------------------------
+# --------------------------------------------------------------------------------------------------------
 atac:
   binarize: False
   normalize: TFIDF
diff --git a/panpipes/panpipes/pipeline_refmap/pipeline.yml b/panpipes/panpipes/pipeline_refmap/pipeline.yml
index 55fc8c9f..abe82ce7 100644
--- a/panpipes/panpipes/pipeline_refmap/pipeline.yml
+++ b/panpipes/panpipes/pipeline_refmap/pipeline.yml
@@ -94,26 +94,5 @@ neighbors:
     # scanpy | hnsw (from scvelo)
     method: scanpy
 
-#----------------------
-# wnn seurat params
-#----------------------
-
-wnn: path_to_wnn_h5
-# instead to a path to a model, in wnn you have to provide a path to a seurat or mudata/anndata object with wnn integration precomputed
-wnn_mapping_normalization: SCTransform, LogNormalize
-# specify normalization to use for query data, same used to build reference wnn.
-normalize: False
-# if RNA is pre-normalized specify method above and set normalize to False
-# provide here a dictionary of the columns or assays you want to predict as defined here
-# https://satijalab.org/seurat/reference/transferdata
-# for the param refdata ( https://satijalab.org/seurat/reference/transferdata#:~:text=(%0A%20%20anchorset%2C-,refdata,-%2C%0A%20%20reference%20%3D%20NULL)
-# if multiple lines are specified these will be supplied as list.
-# write as the example that follows [column_or_data_to_create_in_query]:[name_of_data_in_reference]
-refdata:
-  celltype.l1 : celltype.l1
-  celltype.l2 : celltype.l2
-  predicted_ADT : ADT
-
-
 
   
diff --git a/panpipes/python_scripts/assess_background.py b/panpipes/python_scripts/assess_background.py
index 634278f6..21fb3829 100644
--- a/panpipes/python_scripts/assess_background.py
+++ b/panpipes/python_scripts/assess_background.py
@@ -157,7 +157,7 @@
             fig, ax = plt.subplots(nrows=2,ncols=1,figsize=(24,10), facecolor="white")
             sns.heatmap(bg_df.iloc[:,1:split_int], ax=ax[0])
             sns.heatmap(bg_df.iloc[:,split_int:len(top_genes)], ax=ax[1])
-            fig.suptitle("mean exprs (raw counts) of ADT in empty drops")  
+            fig.suptitle("mean exprs (raw counts) of PROT in empty drops")  
             fig.tight_layout()
         else:
             fig, ax = plt.subplots(figsize=(12,10), facecolor="white")
@@ -173,7 +173,7 @@
             pnp.plotting.adjust_x_axis(ax[0])
             sns.heatmap(plt_df.iloc[split_int:len(top_genes),:], ax=ax[1])
             pnp.plotting.adjust_x_axis(ax[1])
-            fig.suptitle("mean exprs (raw counts) of ADT in empty drops")  
+            fig.suptitle("mean exprs (raw counts) of PROT in empty drops")  
             fig.tight_layout()
         else:
             fig, ax= plt.subplots(figsize=(12,8))
diff --git a/panpipes/python_scripts/run_scanpyQC_prot.py b/panpipes/python_scripts/run_scanpyQC_prot.py
index 86630a2c..64a61879 100644
--- a/panpipes/python_scripts/run_scanpyQC_prot.py
+++ b/panpipes/python_scripts/run_scanpyQC_prot.py
@@ -1,8 +1,8 @@
 '''
-scanpy QC script ADT
+scanpy QC script PROT
 order of QC:
-- GEX
-- ADT
+- RNA
+- PROT
 - Repertoire
 - ATAC
 '''
@@ -151,7 +151,7 @@
 # values on a per channel basis 
 # IS this cause you need the pandas per channel to plot? YES
 
-# the rest of the script is concerned with Per ADT metrics instead of per cell. 
+# the rest of the script is concerned with Per PROT metrics instead of per cell. 
 # per cell metrics are plotted in plotqc.R
 
 
diff --git a/panpipes/python_scripts/run_scanpyQC_rep.py b/panpipes/python_scripts/run_scanpyQC_rep.py
index 8cf53eb1..e2b12c0a 100644
--- a/panpipes/python_scripts/run_scanpyQC_rep.py
+++ b/panpipes/python_scripts/run_scanpyQC_rep.py
@@ -1,8 +1,8 @@
 '''
-scanpy QC script GEX
+scanpy QC script RNA
 order of QC:
-- GEX
-- ADT
+- RNA
+- PROT
 - Repertoire
 - ATAC
 '''
diff --git a/panpipes/python_scripts/run_scanpyQC_rna.py b/panpipes/python_scripts/run_scanpyQC_rna.py
index d47972ba..70f37ce8 100644
--- a/panpipes/python_scripts/run_scanpyQC_rna.py
+++ b/panpipes/python_scripts/run_scanpyQC_rna.py
@@ -2,7 +2,7 @@
 scanpy QC script RNA
 order of QC:
 - RNA
-- ADT
+- PROT
 - Repertoire
 - ATAC
 '''

From d35dadf317cf29cb661d9a9dfd1d305ea621b6ad Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 29 Sep 2023 12:24:58 +0200
Subject: [PATCH 22/28] adt

---
 panpipes/python_scripts/concat_adata.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/panpipes/python_scripts/concat_adata.py b/panpipes/python_scripts/concat_adata.py
index 6344b8eb..8d4650ec 100644
--- a/panpipes/python_scripts/concat_adata.py
+++ b/panpipes/python_scripts/concat_adata.py
@@ -94,7 +94,7 @@
         mdata=mdatas[0]
         del temp
     elif 'prot' in temp.mod.keys() or 'rna' in temp.mod.keys():
-        ## IF GEX and ADT is ok to concatenate ----------
+        ## IF GEX and PROT is ok to concatenate ----------
         mdata = concat_mdatas(mdatas,
             batch_key="sample_id", 
             join_type=args.join_type)

From 583957a53ff0caf6dc5fdacf466825125aa0781f Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 29 Sep 2023 12:27:59 +0200
Subject: [PATCH 23/28] final name

---
 docs/workflows/integration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/workflows/integration.md b/docs/workflows/integration.md
index eee7396b..aaa206bc 100644
--- a/docs/workflows/integration.md
+++ b/docs/workflows/integration.md
@@ -5,7 +5,7 @@ The panpipes integration pipeline implements a variety of tools to batch correct
 
 ![integration_flowchart](../img/integration_coloured.drawio.png)
 
-The flowchart indicates which tools are available for each modality out of GEX (also referred to as RNA), ADT (also referred to as PROT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.
+The flowchart indicates which tools are available for each modality out of GEX (also referred to as RNA), PROT (also referred to as ADT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.
 
 
 ## Steps to run:

From 847559d07bc3142f1ae40a10e22d9cdbeac5d684 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 29 Sep 2023 12:30:18 +0200
Subject: [PATCH 24/28] rna to gex

---
 docs/workflows/integration.md                     | 2 +-
 panpipes/R_scripts/produce_barplot_10xmetric.v3.R | 2 +-
 panpipes/panpipes/pipeline_clustering.py          | 6 +++---
 panpipes/python_scripts/concat_adata.py           | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/workflows/integration.md b/docs/workflows/integration.md
index aaa206bc..dc923aa1 100644
--- a/docs/workflows/integration.md
+++ b/docs/workflows/integration.md
@@ -5,7 +5,7 @@ The panpipes integration pipeline implements a variety of tools to batch correct
 
 ![integration_flowchart](../img/integration_coloured.drawio.png)
 
-The flowchart indicates which tools are available for each modality out of GEX (also referred to as RNA), PROT (also referred to as ADT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.
+The flowchart indicates which tools are available for each modality out of RNA (also referred to as GEX), PROT (also referred to as ADT) and ATAC. You can run as many of these tools as you choose and then you can run `panpipes integration make merge_batch_correction` to create a final object containing one reduced dimension representation and one nearest neighbor graph per modality. This can be used as input to the clustering pipeline.
 
 
 ## Steps to run:
diff --git a/panpipes/R_scripts/produce_barplot_10xmetric.v3.R b/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
index 0a86fc3c..e7ccf528 100644
--- a/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
+++ b/panpipes/R_scripts/produce_barplot_10xmetric.v3.R
@@ -63,7 +63,7 @@ if(!is.null(opt$csvpaths)) {
     }
 
   }
-  # paths <- gsub("filtered_feature_bc_matrix","metrics_summary.csv",caf$gex_path)
+  # paths <- gsub("filtered_feature_bc_matrix","metrics_summary.csv",caf$rna_path)
   paths <- file.path(caf$rna_path, "metrics_summary.csv")
   
   # paths <- gsub("\\/$","", paths)
diff --git a/panpipes/panpipes/pipeline_clustering.py b/panpipes/panpipes/pipeline_clustering.py
index 886b597f..99837875 100644
--- a/panpipes/panpipes/pipeline_clustering.py
+++ b/panpipes/panpipes/pipeline_clustering.py
@@ -103,7 +103,7 @@ def calc_sm_umaps(infile, outfile, mod, mindist, log_file):
             --min_dist %(mindist)s 
             """
     if mod is not None and mod != "multimodal":
-        # if this is not specified it will use gex default
+        # if this is not specified it will use rna default
         cmd += " --modality %(mod)s"
     elif mod=="multimodal":
         if PARAMS['multimodal_integration_method'].lower() == "wnn":
@@ -146,7 +146,7 @@ def calc_cluster(infile, outfile,  mod, res, alg, log_file):
             --algorithm %(alg)s
     """ 
     if mod is not None and mod != "multimodal":
-        # if this is not specified it will use gex default
+        # if this is not specified it will use rna default
         cmd += " --modality %(mod)s"
     elif mod=="multimodal":
         if PARAMS['multimodal_integration_method'].lower() == "wnn":
@@ -412,7 +412,7 @@ def marker_analysis(fname):
 #     if PARAMS['use_muon']:
 #         cmd += " --use_muon True"
 #     if PARAMS['modality']:
-#         # if this is not specified it will use gex default
+#         # if this is not specified it will use rna default
 #         cmd += " --modality %(modality)s"
 #     print(cmd)
 #     P.run(cmd, job_threads=PARAMS['resources_threads_medium'])
diff --git a/panpipes/python_scripts/concat_adata.py b/panpipes/python_scripts/concat_adata.py
index 8d4650ec..9382541c 100644
--- a/panpipes/python_scripts/concat_adata.py
+++ b/panpipes/python_scripts/concat_adata.py
@@ -94,7 +94,7 @@
         mdata=mdatas[0]
         del temp
     elif 'prot' in temp.mod.keys() or 'rna' in temp.mod.keys():
-        ## IF GEX and PROT is ok to concatenate ----------
+        ## IF RNA and PROT is ok to concatenate ----------
         mdata = concat_mdatas(mdatas,
             batch_key="sample_id", 
             join_type=args.join_type)

From 483363d9523561e38b294db0dee759ec6a7624ac Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Wed, 18 Oct 2023 16:07:02 +0200
Subject: [PATCH 25/28] changes added to changelog.md file

---
 CHANGELOG.md       |   4 +++-
 panpipes/.DS_Store | Bin 8196 -> 6148 bytes
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0c2dd957..ce52af05 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,8 @@
 ### added
 
 ### fixed
+- changed all instances of ADT into PROT
+- changed all instances of GEX to RNA
 
 ### dependencies
 
@@ -15,7 +17,7 @@
 - Spatial data analysis is now included in panpipes
     - panpipes qc_spatial
     - panpipes preprocess_spatial
-    - panpipes_deconvolution_spatials
+    - panpipes_deconvolution_spatial
 
 ### fixed
 - make sure columns from individual modalities that are not in the multimodal outer obs can be used to
diff --git a/panpipes/.DS_Store b/panpipes/.DS_Store
index bf2bd9b83852df2f18b5f9e3e8cbe08f50eab467..1d92038220f8b8453a857c10e6663b21a0d435db 100644
GIT binary patch
delta 145
zcmZp1XfcprU|?W$DortDU=RQ@Ie-{MGjdEU6q~50$jCY|U^gS{WFCQLDGr7LhD3%u
zAejlosSL$wDaFZ2`T04F8w=yu7PE732r>h;0)YTGkZ=W=va#?x^JIP*O9hZHBLf2y
Vlx6|ZKn}=0kP{d-$Mei#1^_qT7KZ=;

literal 8196
zcmeHMzfT)682wCBDN?J_7KAc@bZf;@`eW*HMWIqQq|8+aAq1pc?n(;tO`WR#0qm?S
zNVFRZe**smJ9O&M_t{QjCl_+5kc!~5WWU4y-r4W@lFMg^NUiQ~tq_%nsEEq={0W+j
z!sA>=%8cz<1S;@}y6x6VwGl_jNUU^-1LA-<AP$HF;=tW-0Pk!r#WC-FGb*h(APzi8
z2l)FTKxGUqrUvy_2O6CM01N2W1=q*}G$*haT1*XUD(-2r2W6?srWnf5F&=O@FtnH&
zG;~siPRh=#Y=)xr?BE4DoK$E~T5&)e7<GVi_s6tN`_!SvQ2wq?HvQ2kE|)jkG4~dK
zoLw#K9G-t1j@X(Ik?c4p1T*^6I!6iZqBf}w>;P+6OB-wI(pUVor+Zue*}~j1X3oiU
z*S+-p-Jj29-MQHVP1CbIz4RK3ruhuLtedsAptg>yU5csWD}M7~8m_DP=%TMV_uNrD
zQ2j4_Uw6p5v;0n=w+TP?ptpgm?$9$xua5IGUCoyV*S+U6=e(}s>Dj(sz0c1w_W27`
z)+nOL-`nslRNffxYWr1r-+P89a0Yvrm*eYL=Y5;a(80U54IOn#d?hb_PiDAUD&P8A
zCd}jRPkxrKCty8KGxQwb6E}2v`SSFpSXaZRpM4Eu@9nNWPkhF@dF@iff!lK6k-1Tc
z^Z&K+=l{1go!mzp5C`s{1FBH2mRGSbn_G`8k#p?;bpw?P$E5}}1r3LAzfQ-2{eKwZ
gI<RA*#nd21Q2xU~z}R=R|4T;R|9XR@xt9+71#=gEZvX%Q


From 72991e0c01ff50c258baba2b08d3b6ca47ac02d3 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 20 Oct 2023 16:53:08 +0200
Subject: [PATCH 26/28] typo

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b30eb0bc..fb4ecc68 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 Created and Maintained by Charlotte Rich-Griffin and Fabiola Curion  
 Additional contributors: Sarah Ouologuem, Devika Agarwal and Tom Thomas 
 
-See out [documentation](https://panpipes-pipelines.readthedocs.io/en/latest/) and our [preprint](https://www.biorxiv.org/content/10.1101/2023.03.11.532085v1):  
+See our [documentation](https://panpipes-pipelines.readthedocs.io/en/latest/) and our [preprint](https://www.biorxiv.org/content/10.1101/2023.03.11.532085v1):  
 Panpipes: a pipeline for multiomic single-cell data analysis  
 Charlotte Rich-Griffin*, Fabiola Curion*, Tom Thomas, Devika Agarwal, Fabian J. Theis, Calliope A. Dendrou.  
 bioRxiv 2023.03.11.532085;  

From 368b34f4e37eaf8409d1de7a68274b8a533a21f6 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 20 Oct 2023 16:54:08 +0200
Subject: [PATCH 27/28] typos

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index fb4ecc68..d15eeb54 100644
--- a/README.md
+++ b/README.md
@@ -12,10 +12,10 @@ doi: https://doi.org/10.1101/2023.03.11.532085
 
 
 # Introduction
-These pipelines use cgat-core pipeline software
+These workflows use cgat-core pipeline software
 
-Available pipelines:
-1. "ingest" : for the ingestion of data and computation of QC metrics' 
+Available workflows:
+1. "ingest" : for the ingestion of data and computation of QC metrics 
 2. "preprocess" : for filtering and normalising of each modality
 3. "integration" : integrate and batch correction using  single and multimodal methods
 4. "clustering" : cell clustering on single modalities

From ed332efbf8d34d6161b4a104a187ff126358c072 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Fri, 20 Oct 2023 16:54:28 +0200
Subject: [PATCH 28/28] added fixes to changelog

---
 CHANGELOG.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index ce52af05..bbf5f582 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,9 @@
 ### fixed
 - changed all instances of ADT into PROT
 - changed all instances of GEX to RNA
+- changed the params to fix plotting as mentioned in issue #41
+- typo in readme
+
 
 ### dependencies