The Jupyter notebooks below contain all the code required to reproduce the figures and results of the paper A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence.
To work with these files, Jupyter, IPython, and several Python packages should be installed. The easiest way to install these dependencies is by using Miniconda or Anaconda. The following command creates a conda environment that contains all required packages to execute the notebooks.
conda create -n discover-notebooks -c http://ccb.nki.nl/software/discover/repos/conda \
corclust==0.1 \
discover==0.9 \
matplotlib==1.5.1 \
networkx==1.11 \
numpy==1.10.4 \
pandas==0.17.1 \
pytables==3.2.2 \
scipy==0.17.0 \
statsmodels==0.6.1 \
notebook \
ipykernel
Only for the notebook named Group test a few more packages need to be installed using the following command.
conda install -n discover-notebooks -c http://ccb.nki.nl/software/discover/repos/conda -c r -c msys2 \
switching==0.1 \
ccomet-with-timeout==1.0.2 \
rpy2 \
ipyparallel
Next, activate the created environment and start the Jupyter notebook using the following two commands. Make sure <notebook-dir>
is replaced by the location of the .ipynb files after unzipping the downloaded file.
source activate discover-notebooks
jupyter notebook --notebook-dir=<notebook-dir>
On Windows, the first command should be replaced by:
activate discover-notebooks
-
Pairwise analyses of simulated data
Compares the Binomial, Fisher's exact and DISCOVER tests on simulated data.
-
Compares the DISCOVER group test to six alternative methods (CoMEt, MEGSA, MEMo, muex, mutex, and TiMEx) on simulated data.
-
Downloads the mutation and copy number data for the TCGA PANCAN12 studies.
-
Selects the genes for use in the pairwise analyses.
-
Performs pairwise co-occurrence and mutual exclusivity analyses.
-
Within-chromosome co-occurrence analysis
Tests for co-occurrences between genes located on the same chromosome, in order to assess whether the DISCOVER test will detect these 'positive controls'.
-
Determines the overlap of mutually exclusive gene pairs with the STRING functional interaction network.
-
Identifies significantly mutually exclusive gene sets based on predefined gene sets extracted from MSigDb.
-
De novo gene set identification
Detects de novo mutually exclusive gene sets based on correlation clustering of pairwise mutual exclusivities.