Skip to content

gersteinlab/rpgQTL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rpgQTL

Regions per gene (rpg) QTL

This is a package based on TensorQTL with modifications (cis-eQTL module only).
Instead of a single parameter of the cis-window size, rpgQTL allows user to specify regions (could be discontinuous) for each gene. Only SNPs in the corresponding regions will be used for corresponding gene cis-eQTL calling.

Install

Install directly from this repository:

$ git clone https://github.com/gersteinlab/rpgQTL.git
$ cd rpgQTL
$ pip install -r install/requirements.txt .

A detailed tutorial of setting up the environment from scratch could be found here. This is based on Yale HPC system but should be applicable to linux systems with nvidia GPUs.

Requirement

numpy
pandas
tensorqtl

Notice that tensorqtl depends on pandas-plink, pytorch and other packages. For furthur details, see the installing instruction in tensorqtl or pytorch.

Example

See example.ipynb for the example script. The following is based on the example but applied to the general usage of the rpgQTL package.

Input files

plink_prefix_path: Genotype VCF in PLINK format
phenotype_df, phenotype_pos_df: expression file
covariates_df: covariates (e.g. PEER factor, sex, etc.)
For details about the above files, see instruction in tensorqtl

rpg_df: pandas.DataFrame or str

  • If pandas.DataFrame, this would be one large dataframe. Each row correspond to one candidate genomic region for eqtl detection of one gene. Each gene could contain multiple regions (rows). The required first four columns should be:
    • col1: chromosome
    • col2: start of the region
    • col3: ends of the region
    • col4: gene name
  • If str, this would be the path to a directory. For each file in the directory, the file name is a gene name and the file contains the regions for only that gene. The format of each file is the same as the above pandas.DataFrame.

rpgQTL functions

rpgQTL.run_nominal: similar to tensorqtl.cis.map_nominal. Conduct the nominal run to get nominal p-values. rpgQTL.run_permutation: similar to tensorqtl.cis.map_cis. Conduct the permutation run to get beta-adjusted p-values. rpgQTL.run_independent: similar to tensorqtl.cis.map_independent. Conduct the forward-backward run to get independent eQTLs.

rpgQTL Parameters

  • l_window (int; default 2000000):
    The max boundary of cis-window. Only regions within this cis-window will be consider as valid regions. You should consider trans-eqtl calling methods if your interested SNPs are located outside of this window size.
  • s_window (int; default 0):
    A fix cis-window for all genes that will always be included, even if they are not included in the rpg_df. Setting s_window to 1000000 and rpg_df to empty dataframe will be equivalent to the original cis-eqtl calling using tensorqtl with window=1000000.
  • NonHiCType ('remove', 'l_window' or 's_window'):
    Different ways to deal with genes that have genotype and expression data, but no candidate regions.
    • 'remove': The genes will be completely removed from calculation.
    • 'l_window': All SNPs within the cis-window defined by 'l_window' will be used.
    • 's_window': All SNPs within the cis-window defined by 's_window' will be used.
  • eGene_list (list of str; default None): For run_independent only. When given, only independent eQTLs for this gene list will be calculated. By default, this is set to None, so that all genes filter by the FDR threshold are used.

Other parameters in rpgQTL functions are the same as those in the corresponding functions from tensorqtl. Notice the rpgQTL only support for the basic cis-eQTL calculations, and thus some tensorQTL parameters are not supported (e.g. all parameters related to "interaction" from tensorqtl.cis.map_nominal are not supported in rpgQTL.run_nominal).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published