-
Notifications
You must be signed in to change notification settings - Fork 100
Downstreamer
🚧 Under construction 🚧
Downstreamer can be used to to perform pathway enrichments and core-gene prioritizations using GWAS summary statistics. Downstreamer does not rely on known pathway assignments of genes but instead uses predicted gene-pathway assignments [1]. The core-gene prioritizations of genes are based on co-regulation. This co-regulation is a measure related to co-expression but less driven by tissue differences [2]. Both the gene-pathway assignments and co-expression scores are calculated using 31,499 public RNA-seq samples from many different tissues [1].
A manuscript is in preparation
1️⃣ Getting started
2️⃣ Running Downstreamer
5️⃣ Most important output files
Download tool here: TODO
Download reference data here: TODO
The main function of Dowstreamer is using GWAS summary statistics to predict relevant pathways and genes. The commands needed to do this using the by us provided pathway databases are listed below. Other modes of Downstreamer are listed here: DS additional modes
All Downstreamer commands have the following basis:
java -Xmx10g -jar Downstreamer.jar --mode [MODE] --output
On top of this you might need to allocate more memory when running Downstream. If you get an heapspace error than increase the value of -Xmx10g to increase the amout of memory that downstreamer can use.
First the GWAS summary statistics should be saved in the folling manner:
- TAB seperated text file
- First column must contain the variant identifiers as used in the LD reference panel. If the reference panel uses RS identifiers then these should be used and if chr:pos indentifiers are used by the reference panel then these should be used here.
- The following columns should contain the GWAS variant p-values. The header of these columns should be name of the GWAS
- It is not needed to filter on variants present in the reference data, these will automaticly be excluded.
Secondly this text file needs to be convered to a binary file for quick access by Downsreamer. That can be done using the CONVERT_TXT
mode with the following command:
❗ While it is possible to combine multiple GWAS summary statistics in a single file this is only recommended if they are performed on the same cohort and the same genotyping data.
java -Xmx10g -jar Downstreamer.jar --mode CONVERT_TXT --output [PATH_TO_OUTPUT_FILE] --pvalueToZscore --gwas [PATH_TO_GWAS_TEXT_FILE]
You now get 3 files, 1 binary file with the summary statistics matrix and 2 text files depicting the rows and columns. If al went well the columns files will contain the names of the studies in the orignal text file.
The first part of Downstreamer is the most computationally and memory intensive. Here the GWAS SNP p-values are converted to GWAS gene p-values. Additionally this step does a very basis pruning to identify independent top GWAS hits, although it is possible to provide your own list of top hits in STEP2. The gene p-values are the basis of the pathway enrichment analysis and gene prioritizations in STEP2. Because it is not needed to recalcute the gene p-values when performing multiple enrichment analysis this step only needs to be run once. It possible to provide the STEP2 arguments when running STEP1, in this case Downstreamer will automatically continue with STEP2.
Arguments used by STEP1
Option | Value | Recommend | Description |
---|---|---|---|
String | |||
double | |||
Arguments used by STEP2
Option | Value | Recommend | Description |
---|---|---|---|
String | |||
double | |||
Among the different intermediate files, two files contain the primary output.
This file contains the results of the different enrichment analysis. Each sheet contains the results of single enrichment source. By default we run enrichment on the following types of datasets. But in principle it can be done on any genes times X matrix.
The pathways enrichments are performed using the predicted pathways assignments from [1]. Here we test if the predicted gene assignments to a pathway are correlate to GWAS gene signal.
The sample enrichments (in the expression tab) indicate the correlation between expression levels of each of the 31,499 samples in our gene-network and the GWAS gene p-values. We found that to top enriched samples are often very relevant for the trait studied trait.
The GTEx sheet is used to determine the enrichment of primary tissues by correlating the GWAS to the average expression profile of each tissue in GTEx.
As an input we use a gene by gene matrix to expression the relation between genes. We typically use co-regulation to do this (see introduction above). This means that for each gene we have a metric how it relates to all other genes, there scores are then correlated to the GWAS gene scores. The idea being that genes that are co-regulated with many genes that are picked up by the GWAS are more important to the studied trait. Since this is done in a genome wide manner we can prioritize trans acting genes outside of the GWAS loci
The co-regulation scores can also be used to prioritize candidate genes within the loci identified by the GWAS. In this file we list all the genes within a window around the independent top variants and we prioritize these variants based on the overal gene prioritization. These prioritization scores do not nesearally have to be very strong since they these genes might only have a small effect on the overall outcome of the traits. We do however suspect that the causal genes within a cis locus will, on average, have higher scores. We are currently investigating this further.
- QTL mapping pipeline
- Genotype Harmonizer
- Genotype IO
- ASE
- GADO Command line
- Downstreamer
- GeneNetwork Analysis
Analysis plans
Other