Downstreamer

🚧 Under construction 🚧

Downstreamer can be used to to perform pathway enrichments and core-gene prioritizations using GWAS summary statistics. Downstreamer does not rely on known pathway assignments of genes but instead uses predicted gene-pathway assignments [1]. The core-gene prioritizations of genes are based on co-regulation. This co-regulation is a measure related to co-expression but less driven by tissue differences [2]. Both the gene-pathway assignments and co-expression scores are calculated using 31,499 public RNA-seq samples from many different tissues [1].

A manuscript is in preparation

Content

1️⃣ Getting started
2️⃣ Running Downstreamer

5️⃣ Most important output files

1. Getting started

Download tool here: TODO

Download reference data here: TODO

2. Running Downstreamer

The main function of Dowstreamer is using GWAS summary statistics to predict relevant pathways and genes. The commands needed to do this using the by us provided pathway databases are listed below. Other modes of Downstreamer are listed here: DS additional modes

All Downstreamer commands have the following basis:

java -Xmx10g -jar Downstreamer.jar --mode [MODE] --output

On top of this you might need to allocate more memory when running Downstream. If you get an heapspace error than increase the value of -Xmx10g to increase the amout of memory that downstreamer can use.

2.a. Preparing GWAS summary statistics

First the GWAS summary statistics should be saved in the folling manner:

TAB seperated text file
First column must contain the variant identifiers as used in the LD reference panel. If the reference panel uses RS identifiers then these should be used and if chr:pos indentifiers are used by the reference panel then these should be used here.
The following columns should contain the GWAS variant p-values. The header of these columns should be name of the GWAS
It is not needed to filter on variants present in the reference data, these will automaticly be excluded.

Secondly this text file needs to be convered to a binary file for quick access by Downsreamer. That can be done using the CONVERT_TXT mode with the following command:

❗ While it is possible to combine multiple GWAS summary statistics in a single file this is only recommended if they are performed on the same cohort and the same genotyping data.

java -Xmx10g -jar Downstreamer.jar --mode CONVERT_TXT --output [PATH_TO_OUTPUT_FILE] --pvalueToZscore --gwas [PATH_TO_GWAS_TEXT_FILE]

You now get 3 files, 1 binary file with the summary statistics matrix and 2 text files depicting the rows and columns. If al went well the columns files will contain the names of the studies in the orignal text file.

2.b. Dowstreamer STEP1

The first part of Downstreamer is the most computationally and memory intensive. Here the GWAS SNP p-values are converted to GWAS gene p-values. Additionally this step does a very basis pruning to identify independent top GWAS hits, although it is possible to provide your own list of top hits in STEP2. The gene p-values are the basis of the pathway enrichment analysis and gene prioritizations in STEP2. Because it is not needed to recalcute the gene p-values when performing multiple enrichment analysis this step only needs to be run once. It possible to provide the STEP2 arguments when running STEP1, in this case Downstreamer will automatically continue with STEP2.

Arguments used by STEP1

Option	Value	Recommend	Description
	String
	double

2.c.Downstreamer STEP2

Arguments used by STEP2

Option	Value	Recommend	Description
	String
	double

5. Most important output files

Among the different intermediate files, two files contain the primary output.

TraitName_enrichtments.xlsx

This file contains the results of the different enrichment analysis. Each sheet contains the results of single enrichment source. By default we run enrichment on the following types of datasets. But in principle it can be done on any genes times X matrix.

Pathway enrichments

The pathways enrichments are performed using the predicted pathways assignments from [1]. Here we test if the predicted gene assignments to a pathway are correlate to GWAS gene signal.

Sample & tissue enrichments

The sample enrichments (in the expression tab) indicate the correlation between expression levels of each of the 31,499 samples in our gene-network and the GWAS gene p-values. We found that to top enriched samples are often very relevant for the trait studied trait.

The GTEx sheet is used to determine the enrichment of primary tissues by correlating the GWAS to the average expression profile of each tissue in GTEx.

Gene co-regulation or co-expression enrichments

As an input we use a gene by gene matrix to expression the relation between genes. We typically use co-regulation to do this (see introduction above). This means that for each gene we have a metric how it relates to all other genes, there scores are then correlated to the GWAS gene scores. The idea being that genes that are co-regulated with many genes that are picked up by the GWAS are more important to the studied trait. Since this is done in a genome wide manner we can prioritize trans acting genes outside of the GWAS loci

TraitName_cisPrio.xlsx

The co-regulation scores can also be used to prioritize candidate genes within the loci identified by the GWAS. In this file we list all the genes within a window around the independent top variants and we prioritize these variants based on the overal gene prioritization. These prioritization scores do not nesearally have to be very strong since they these genes might only have a small effect on the overall outcome of the traits. We do however suspect that the causal genes within a cis locus will, on average, have higher scores. We are currently investigating this further.

References

[1] https://www.nature.com/articles/s41467-019-10649-4

[2] https://www.nature.com/articles/ng.3173

Systems Genetics

Analysis plans

Other

Provide feedback

Saved searches

Use saved searches to filter your results more quickly