DendrouLab · bio-la · Dec 1, 2023 · Nov 28, 2023 · Nov 29, 2023 · Nov 29, 2023
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -33,6 +33,7 @@
 - fixed lsi requirement for atac
 - fixed top features for atac
 - fixed filtering HVG for rna
+- moved pynndescent to PyPi dependencies
 
 
 ### dependencies

diff --git a/docs/install.md b/docs/install.md
@@ -1,17 +1,18 @@
 
 # Installation of panpipes
 
-## Step 1: create virtual environment
+### Create virtual environment
 
 We recommend running panpipes within a virtual environment to maintain reproducibility
 
 
 ### Option 1: create conda environment (Recommended)
 
-We create a conda environment with R and python
-Panpipes has a lot of dependencies, so you may want to consider [`mamba`](https://mamba.readthedocs.io/en/latest/index.html) instead of `conda for installation.
+To Run panpipes, we install it in a conda environment with R and python.
+Panpipes has a lot of dependencies, so you may want to consider the faster [`mamba`](https://mamba.readthedocs.io/en/latest/index.html) instead of `conda` for installation.
 
 ```
+#This follows the suggestions made here: [https://www.biostars.org/p/498049/](https://www.biostars.org/p/498049/) 
 conda config --add channels conda-forge
 conda config --set channel_priority strict
 # you should remove the strict priority afterwards!
@@ -24,52 +25,29 @@ now we activate the environment
 conda activate pipeline_env
 ```
 
-This follows the suggestions made here: [https://www.biostars.org/p/498049/](https://www.biostars.org/p/498049/) 
-
-Install specific dependencies
-
-```
-conda install -c conda-forge pynndescent
-```
-
-Install R packages
+Let's first install the R packages
 ```
 conda install -c conda-forge r-tidyverse r-optparse r-ggforce r-ggraph r-xtable r-hdf5r r-clustree
 ```
 
-Panpipes requires the unix package `time`, in conda you can install it with:
+Then we can install panpipes: 
 
-You can check if it installed with 
-
-```
-dpkg-query -W time
-```
-if this is not already installed on your conda env with: 
-
-```
-conda install time
-```
-or
-
-```
-apt-get install time
-```
+#### 1. Installing panpipes from PyPi 
 
 You can install `panpipes` directly from `PyPi` with:
 
 ```
 pip install panpipes
 ```
 
-If you intend to use panpies for spatial analysis, instead install:
+If you intend to use panpipes for spatial analysis, instead install:
 ```
 pip install 'panpipes[spatial]'
 ```
 The extra `[spatial]` includes squidpy and cell2location packages.
 
 
-
-#### Nightly versions of panpipes.
+#### 2. Nightly versions of panpipes.
 
 If you would prefer to use the most recent dev version, install from github
 
@@ -79,9 +57,25 @@ cd panpipes
 pip install -e .
 ```
 
+------------
+
+Panpipes requires the unix package `time`. 
+You can check if it installed with `dpkg-query -W time`. If time not already installed, you can 
+
+```
+conda install time
+```
+or
+
+```
+apt-get install time
+```
+
+
+
 ### Option 2: python venv environment:
 
-Navigate to where you want to create your virtual environment  and follow the steps below to create a pip virtual environment
+Navigate to where you want to create your virtual environment and follow the steps below to create a pip virtual environment
 
 ```
 python3 -m venv --prompt=panpipes python3-venv-panpipes/
@@ -98,19 +92,21 @@ As explained in the conda installation, you can install `panpipes` with:
 ```
 pip install panpipes
 ```
+or install a nightly version of panpipes cloning the github repo.
 
-If you would prefer to use the most recent dev version, install from github
+#### R packages installation in python venv
 
-```
-git clone https://github.com/DendrouLab/panpipes
-cd panpipes
-pip install -e .
-```
+If you are using a venv virtual environment, the pipeline will call a local R installation, so make sure R is installed and install the required packages with the command we provide below.
+(This executable requires that you specify a CRAN mirror in your `.Rprofile`).
+for example, add this line to your `.Rprofile` to automatically fetch the preferred mirror:
 
+*remember to customise with your preferred [R mirror](https://cran.r-project.org/mirrors.html).*
 
+```
+  options(repos = c(CRAN="https://cran.uni-muenster.de/"))
+```
 
-If you are using a venv virtual environment,  the pipeline will call a local R installation, so make sure R is installed and install the required packages with the command we provide below.
-(This executable requires that you specify  a CRAN mirror in your `.Rprofile`)
+Now, to automatically install the R dependecies, run:
 
  ```
 panpipes install_r_dependencies
@@ -131,13 +127,11 @@ A list of available pipelines should appear!
 
 
 You're all set to run `panpipes` on your local machine.
-If you want to configure it on a HPC server, jump to [step 2](#step-2-pipeline-configuration)
-
-
-## Step 2 pipeline configuration 
+If you want to configure it on a HPC server, follow the next instructions.
 
+## Pipeline configuration for HPC clusters
 (For SGE or SLURM clusters)
-*Note: You won't need this for a local installation of panpipes.*
+*Note: You only need this configuration step if you want to use an HPC to dispatch individual task as separate parallel jobs. You won't need this for a local installation of panpipes.*
 
 Create a yml file for the cgat core pipeline software to read
 
@@ -189,7 +183,7 @@ echo "export DRMAA_LIBRARY_PATH=$PATH_TO/libdrmaa.so.1.0" >> ~/.bashrc
 ```
 
 ### Specifying Conda environments to run panpipes
-If using conda environments, you can use one single big environment (the instructions provided do that) or create one for each of the workflows in panpipes, (i.e. one workflow = one environment) 
+If using conda environments, you can use one single big environment (the instructions provided do just that) or create one for each of the workflows in panpipes, (i.e. one workflow = one environment) 
 The environment (s) should be specified in the .cgat.yml global configuration file or in each of the single workflows pipeline.yml configuration files and it will be picked up by the pipeline as the default environment. 
 Please note that if you specify the conda environment in the workflows configuration file this will be the first choice to run the pipeline. 
 

diff --git a/docs/release_notes.md b/docs/release_notes.md
@@ -1,2 +1,3 @@
 Release Notes
 ==============
+
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
@@ -1,7 +1,7 @@
 Tutorials
 ==========
 
-Check out the following tutorials which take you through some common analysis steps with Panpipes:
+Check out the following tutorials which take you through common single cell multimodal analysis steps with Panpipes:
 
 
 - [Ingest workflow](https://panpipes-tutorials.readthedocs.io/en/latest/ingesting_data/Ingesting_data_with_panpipes.html)
@@ -21,4 +21,5 @@ Spatial analysis:
 Additional tutorials:
 
 - [Ingesting multiome from cellranger outputs](https://panpipes-tutorials.readthedocs.io/en/latest/ingesting_multiome/ingesting_mome.html)
+- [Ingesting mouse data](https://panpipes-tutorials.readthedocs.io/en/latest/ingesting_mouse/Ingesting_mouse_data_with_panpipes.html)
 
diff --git a/docs/usage/general_principles.md b/docs/usage/general_principles.md
@@ -92,4 +92,4 @@ When it's completed, you will find a message informing you it's done, like this
 
 ## Final notes
 
-All panpipes workflow follow these general principles, with specific custom parameters and input files for each workflow. See the [Worflows](../workflows/) section for detailed info on each workflow and check out our [Tutorials](../tutorials/) for more examples.
+All panpipes workflow follow these general principles, with specific custom parameters and input files for each workflow. See the [Worflows](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/index.html) section for detailed info on each workflow and check out our [Tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html) for more examples.
diff --git a/docs/workflows/preprocess.md b/docs/workflows/preprocess.md
@@ -4,18 +4,20 @@ Preprocessing
 
 ## Pipeline steps
 
-The preprocess pipeline filters the data as defined in the [filtering dictionary](../usage/filter_dict_instructions.md) section of the `pipeline.yml`. The data can also been downsampled.
+The preprocess pipeline filters the data as defined in the [filtering dictionary](../usage/filter_dict_instructions.md) section of the `pipeline.yml`. The data can also been downsampled to a defined number of cells.
 Then each modality is normalised and scaled. For the RNA this is normalising counts per cell with [scanpy.pp.normalize_total](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.normalize_total.html) and optionally, regressing and scaling the data using scanpy functions. Highly variable genes (HVGs) are also calculated, and a PCA performed on those highly variable genes. There is an option to exclude specific genes from the HVGs e.g. HLA genes or BCR/TCR genes. These are specified in the same way as all [gene lists](../usage/gene_list_format). In the example below, the "group" in the gene list file is "exclude".
 ```
 hvg:
   exclude_file: resources/qc_genelist_1.0.csv
   exclude: "exclude"
 ```
 
-For Protein assay, the data are normalised either by centralised-log-ratio or by dsb as described in the muon documentation [here](https://muon.readthedocs.io/en/latest/omics/citeseq.html). There is additional panpipes functionality to trim dsb outliers as discussed on the dsb [github page](https://github.com/niaid/dsb/issues/9)
+For Protein assay, the data are normalised either by centralised-log-ratio or by dsb as described in the muon documentation [here](https://muon.readthedocs.io/en/latest/omics/citeseq.html). There is additional panpipes functionality to trim dsb outliers as discussed on the dsb [github page](https://github.com/niaid/dsb/issues/9) dsb can only be run if the input data contains raw counts (the cellranger outs folder).
+PCA is performed on the protein data, the number of components can be specified and is automatically adjusted to be `n_vars-1` when `n_pcs > n_vars`
 
 
-For the ATAC assay ....
+For the ATAC assay, the data are normalized either by standard normalization or with one of the TFIDF flavours included (see [normalization](https://panpipes-pipelines.readthedocs.io/en/latest/usage/normalization_methods.html)).
+Then, dimensionality reduction is computed, either LSI or PCA with custom defined number of components. 
 
 
 ## Steps to run:
@@ -25,18 +27,19 @@ For the ATAC assay ....
    ``panpipes preprocess config``
 2. edit the pipeline.yml file
 
-   -  The filtering options are dynamic depending on your qc_mm inputs. This is described [here](../usage/filter_dict_instructions.md) 
+   -  The filtering options are dynamic depending on your `ingest` inputs. This is described [here](../usage/filter_dict_instructions.md) 
    -  There are lots of options for normalisation explained in the
-      pipeline.yml
+      pipeline.yml and in [normalization](https://panpipes-pipelines.readthedocs.io/en/latest/usage/normalization_methods.html), 
+      check the one that works for your data
 
 3. Run complete preprocess pipeline with
    ``panpipes preprocess make full``
 
 The h5mu outputted from ``preprocess`` is filtered and normalised, and
-for rna highly variable genes are computed.
+for rna and atac highly variable genes are computed.
 
 
 ## Expected structure of MuData object
-The ideal way to run `panpipes preprocess` is to use the output mudata file from `panpipes qc_mm`, as this will make sure the MuData object has correctly names layers and slots. 
+The ideal way to run `panpipes preprocess` is to use the output mudata file from `panpipes ingest`, as this will make sure the MuData object has correctly names layers and slots. 
 
 The bare minimum MuData object required is raw data in the X slot of each modality and a sample_id column the .obs slot of each of each modality, and the common (outer) obs.
diff --git a/panpipes/.DS_Store b/panpipes/.DS_Store
diff --git a/panpipes/entry.py b/panpipes/entry.py
@@ -36,7 +36,10 @@ def main(argv=None):
                           '3. "integration" : integrate and batch correction using  single and multimodal methods', 
                           '4. "clustering" : cell clustering on single modalities', 
                           '5. "refmap" : transfer scvi-tools models from published data to your data', 
-                          '6. "vis" : visualise metrics from other pipelines in context of experiment metadata']
+                          '6. "vis" : visualise metrics from other pipelines in context of experiment metadata',
+                          '7. "qc_spatial" : for the ingestion of spatial transcriptomics (ST) data',
+                          '8. "preprocess_spatial" : for filtering and normalizing ST data',
+                          '9. "deconvolution_spatial" : for the cell type deconvolution of ST slides']
         print(*pipelines_list, sep="\n")
         return
     command = argv[1]

diff --git a/panpipes/panpipes/pipeline_preprocess/pipeline.yml b/panpipes/panpipes/pipeline_preprocess/pipeline.yml
@@ -264,7 +264,7 @@ prot:
   # note that this feature is in the default muon mu.pp.dsb code, but manually implemented in this code.
   quantile_clipping: True
 
-  # which normalisation method to be store in the X slot. If you choose to run more than one normalisation method,
+  # which normalisation method to be stored in the X slot. If you choose to run more than one normalisation method,
   # which one to you want to store in the X slot, if not specified 'dsb' is the default when run.
   store_as_X: 
 

diff --git a/panpipes/python_scripts/run_scanpyQC_prot.py b/panpipes/python_scripts/run_scanpyQC_prot.py
@@ -95,10 +95,10 @@
     per_cell_metrics = args.per_cell_metrics.split(",")
     per_cell_metrics = [a.strip() for a in per_cell_metrics]
 
-# TODO: What happens if it is None?
 
 
-# work out if we already have istype column, if not try to infer from index.
+
+# work out if we already have isotype column, if not try to infer from index.
 if 'isotype' not in prot.var.columns:
     # this means that isotype column was not included in the protein conversion table 
     # so we are going to have a wwhack at identifying them
@@ -123,7 +123,7 @@
                         percent_top=None,log1p=True, inplace=True)
 
 ## let's assess the isotype outlier cells. 
-#(Cells with an excessive amount of isotype indictaing stickiness)
+#(Cells with an excessive amount of isotype indicating stickiness)
 if (len(isotypes) > 0) & check_for_bool(args.identify_isotype_outliers):
     L.info("identifying isotype outliers")
     # this measn we found some isotypes earlier

diff --git a/pyproject.toml b/pyproject.toml
@@ -49,6 +49,7 @@ dependencies = [
     "paramiko", 
     "pep8",
     "pysam",
+    "pynndescent",
     "pytest",
     "pyyaml",
     "ruffus",
Original file line number	Diff line number	Diff line change
Expand Up		@@ -92,4 +92,4 @@ When it's completed, you will find a message informing you it's done, like this

		## Final notes

		All panpipes workflow follow these general principles, with specific custom parameters and input files for each workflow. See the [Worflows](../workflows/) section for detailed info on each workflow and check out our [Tutorials](../tutorials/) for more examples.
		All panpipes workflow follow these general principles, with specific custom parameters and input files for each workflow. See the [Worflows](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/index.html) section for detailed info on each workflow and check out our [Tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html) for more examples.