Location at JHPCE: /dcl01/lieber/ajaffe/lab/brainseq_phase2/browser
.
This directory has all the files for creating the eQTL browser. As stated in the log file the files and their number of lines are:
396584 BrainSeqPhaseII_clean_expression_development_exon.txt
24653 BrainSeqPhaseII_clean_expression_development_gene.txt
297182 BrainSeqPhaseII_clean_expression_development_jxn.txt
92733 BrainSeqPhaseII_clean_expression_development_tx.txt
396584 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_exon.txt
24653 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_gene.txt
297182 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_jxn.txt
92733 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_tx.txt
396584 BrainSeqPhaseII_clean_expression_eqtl_hippo_exon.txt
24653 BrainSeqPhaseII_clean_expression_eqtl_hippo_gene.txt
297182 BrainSeqPhaseII_clean_expression_eqtl_hippo_jxn.txt
92733 BrainSeqPhaseII_clean_expression_eqtl_hippo_tx.txt
396584 BrainSeqPhaseII_clean_expression_eqtl_interaction_exon.txt
24653 BrainSeqPhaseII_clean_expression_eqtl_interaction_gene.txt
297182 BrainSeqPhaseII_clean_expression_eqtl_interaction_jxn.txt
92733 BrainSeqPhaseII_clean_expression_eqtl_interaction_tx.txt
396584 BrainSeqPhaseII_clean_expression_regionspecific_adult_exon.txt
24653 BrainSeqPhaseII_clean_expression_regionspecific_adult_gene.txt
297182 BrainSeqPhaseII_clean_expression_regionspecific_adult_jxn.txt
92733 BrainSeqPhaseII_clean_expression_regionspecific_adult_tx.txt
396584 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_exon.txt
24653 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_gene.txt
297182 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_jxn.txt
92733 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_tx.txt
396584 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_exon.txt
24653 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_gene.txt
297182 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_jxn.txt
92733 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_tx.txt
396584 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_exon.txt
24653 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_gene.txt
297182 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_jxn.txt
92733 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_tx.txt
396584 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_exon.txt
24653 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_gene.txt
297182 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_jxn.txt
92733 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_tx.txt
28863540 BrainSeqPhaseII_eQTL_dlpfc_full.txt
3698269 BrainSeqPhaseII_eQTL_dlpfc_raggr.txt
758634578 BrainSeqPhaseII_eQTL_dlpfc_replication_GTEx.txt
22603132 BrainSeqPhaseII_eQTL_hippo_full.txt
3698269 BrainSeqPhaseII_eQTL_hippo_raggr.txt
758634578 BrainSeqPhaseII_eQTL_hippo_replication_GTEx.txt
1776812 BrainSeqPhaseII_eQTL_interaction_full.txt
398556091 BrainSeqPhaseII_eQTL_interaction_replication_GTEx.txt
396584 BrainSeqPhaseII_feature_annotation_exon.txt
24653 BrainSeqPhaseII_feature_annotation_gene.txt
297182 BrainSeqPhaseII_feature_annotation_jxn.txt
92733 BrainSeqPhaseII_feature_annotation_tx.txt
902 BrainSeqPhaseII_sample_metadata.txt
7023861 BrainSeqPhaseII_snp_annotation.txt
7023861 BrainSeqPhaseII_snp_genotype.txt
902 BrainSeqPhaseII_sample_metadata.txt
Includes many columns, but some of the more useful ones are:
SAMPLE_ID
: actual file IDRNum
: internal LIBD RNA sequencing IDBrNum
: internal LIBD brain (subject) IDRegion
: either HIPPO or DLPFCDx
: either Control or SCZDAge
: numeric age in yearsDLFPC_HIPPO_correlation_*
(*
can begeneRpkm
,exonRpkm
,jxnRp10m
ortxTpm
): HIPPO vs DLPFC correlation across the corresponding feature (only 265 subjects). Values are stored only for theRegion == DLPFC
entries.analysis_regionspecific_adult
: logical, TRUE if the sample was used for the HIPPO vs DLPFC region analysis with adult controls.analysis_regionspecific_prenatal
: similar to the previous one, but for prenatal age.analysis_development
: logical, TRUE if the same was used for the analysis across development (age); they are only controls.analysis_sczd_casecontrol_dlpfc
: logical, TRUE if used for the SCZD vs control DE analysis for the DLPFC region.analysis_sczd_casecontrol_hippo
: logical, TRUE if used for the SCZD vs control DE analysis for the HIPPO region.analysis_sczd_casecontrol_interaction
: logical, TRUE if used for the SCZD vs control and brain region interaction DE analysis across DLPFC and HIPPO.analysis_eqtl_dlpfc
: logical, TRUE if used for the eQTL analysis with DLFPC samples.analysis_eqtl_hippo
: logical, TRUE if used for the eQTL analysis with HIPPO samples.analysis_eqtl_interaction
: logical, TRUE if used for the eQTL and brain region interaction analysis.
7023861 BrainSeqPhaseII_snp_annotation.txt
Columns:
snp
: snp IDchr_hg38
: chromosome in hg38 coordinatespos_hg38
: position in hg38 coordinateschr_hg19
: similar tochr_hg38
but in hg19 coordinatespos_hg19
: hg19 position.cm
counted
: allele quantifiedalt
: alternative alleletype
newref
: new reference allele (use this); this is the non-effect (major) allelenewcount
: new counted allele (use this); this is the effect (minor) allelename
: snp rs number (use this)rsnumguess
: guessed snp name
This file is available for download here.
Columns are the subjects labeled by the BrNum
from the sample metadata. Rows are the SNPs with labels corresponding to the snp
column from the SNP information table. Entries are 0, 1, 2 or NA.
396584 BrainSeqPhaseII_feature_annotation_exon.txt
24653 BrainSeqPhaseII_feature_annotation_gene.txt
297182 BrainSeqPhaseII_feature_annotation_jxn.txt
92733 BrainSeqPhaseII_feature_annotation_tx.txt
Each files includes the annotation for each feature type. Since each feature is a bit different, each files has different columns. Some common ones are:
seqnames
: chromosomestart
: start position of the featureend
: end positionstrand
: strand of the featurefeature_id
: this matches the id used for the eQTL result tables.Symbol
(gene_name
fortx
): symbol of the gene the feature corresponds toClass
(except fortx
): classification for the feature, this really only matters forjxn
where the exon-exon junction could be un-annotated (novel), in Gencode, etc.gencodeID
(gene
,exon
),gencodeGeneID
(jxn
),gene_id
(tx
): Gencode gene ID.
28863540 BrainSeqPhaseII_eQTL_dlpfc_full.txt
3698269 BrainSeqPhaseII_eQTL_dlpfc_raggr.txt
22603132 BrainSeqPhaseII_eQTL_hippo_full.txt
3698269 BrainSeqPhaseII_eQTL_hippo_raggr.txt
22603132 BrainSeqPhaseII_eQTL_interaction_full.txt
Each of these tables has the following columns:
snp
: snp ID, matches thesnp
column from the SNP information file (BrainSeqPhaseII_snp_annotation.txt
).feature_id
: feature ID, matches thefeature_id
column of the expression feature files.statistic
: eQTL t-statisticpvalue
: nominal p-valueFDR
: FDR adjusted p-valuebeta
: eQTL beta coefficient. If you want theSE
, you can compute it usingbeta
/statistic
.Type
: feature type in lower-case. This matches the expression feature files, for examplegene
forBrainSeqPhaseII_feature_annotation_gene.txt
.
The 5 eQTL tables correspond to the following types of eQTLs:
- DLPFC using all the genome (
dlpfc_full
), so the samples withTRUE
under theanalysis_eqtl_dlpfc
column from the sample info table. Shows only results with a nominal p-value <0.001. - HIPPO using all the genome (
hippo_full
), so the samples withTRUE
under theanalysis_eqtl_hippo
column from the sample info table. Shows only results with a nominal p-value <0.001. - Brain region and eQTL interaction across all the genome (
interation_full
), so the samples withTRUE
under theanalysis_eqtl_interaction
column from the sample info table. Shows only results with a nominal p-value <0.001. - Sub-analysis using only the PGC2 SNPs and neighboring SNPs identified with rAggr (9,736 SNPs) for DLFPC (
dlpfc_raggr
). Given the smaller number of SNPs considered, the FDR is different compared todlpfc_full
. Shows all associations (so no nominal p-value filter). - Sub-analysis using only the PGC2 SNPs and neighboring SNPs identified with rAggr (9,736 SNPs) for HIPPO (
hippo_raggr
). Given the smaller number of SNPs considered, the FDR is different compared tohippo_full
. Shows all associations (so no nominal p-value filter).
We considered as significant associations those that had a FDR < 0.01.
A subset of the *full.txt
results are available at SupplementaryTable15_eQTL.tar.gz
which has the results with FDR < 1%. If you are interested in the complete unfiltered list of eQTL associations please let us know.
We also have 3 more tables that include all associations (no nominal p-value filter) between any of the features that had a significant eQTL in the 5 main analyses against any of the SNPs involved in those analyses. Note that some values are NA
in these tables (likely to them not being observed in the GTEx samples).
758634578 BrainSeqPhaseII_eQTL_dlpfc_replication_GTEx.txt
758634578 BrainSeqPhaseII_eQTL_hippo_replication_GTEx.txt
398556091 BrainSeqPhaseII_eQTL_interaction_replication_GTEx.txt
The nominal p-value can be reported for replication purposes.
The tables contained the expression values for making boxplots or other types of graphs. The expression values are already normalized (RPKM: gene/exon, RP10M: jxn, TPM: tx) and log2 scaled (log2(x + 0.5)) with covariates for the corresponding analysis removed. Samples can be identified using the analysis_*
columns from the sample table. The column names are the RNum
from the sample metadata table and the rows correspond to the feature_id
(as in the same order as the feature annotation tables).
Can be visualized as 3 boxplots, one per allele type, for example CC, AC and AA.
396584 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_exon.txt
24653 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_gene.txt
297182 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_jxn.txt
92733 BrainSeqPhaseII_clean_expression_eqtl_dlpfc_tx.txt
396584 BrainSeqPhaseII_clean_expression_eqtl_hippo_exon.txt
24653 BrainSeqPhaseII_clean_expression_eqtl_hippo_gene.txt
297182 BrainSeqPhaseII_clean_expression_eqtl_hippo_jxn.txt
92733 BrainSeqPhaseII_clean_expression_eqtl_hippo_tx.txt
Can be visualized as 6 (3 x 2) boxplots, one per allele type, for example CC, AC and AA for each brain region. Brain region can be extracted from the sample metadata column.
396584 BrainSeqPhaseII_clean_expression_eqtl_interaction_exon.txt
24653 BrainSeqPhaseII_clean_expression_eqtl_interaction_gene.txt
297182 BrainSeqPhaseII_clean_expression_eqtl_interaction_jxn.txt
92733 BrainSeqPhaseII_clean_expression_eqtl_interaction_tx.txt
Can be visualized as 2 boxplots, one per brain region. R colors:
- DLPFC:
'dark orange'
- HIPPO:
'skyblue3'
396584 BrainSeqPhaseII_clean_expression_regionspecific_adult_exon.txt
24653 BrainSeqPhaseII_clean_expression_regionspecific_adult_gene.txt
297182 BrainSeqPhaseII_clean_expression_regionspecific_adult_jxn.txt
92733 BrainSeqPhaseII_clean_expression_regionspecific_adult_tx.txt
396584 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_exon.txt
24653 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_gene.txt
297182 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_jxn.txt
92733 BrainSeqPhaseII_clean_expression_regionspecific_prenatal_tx.txt
Note that the analysis was done using limma-voom with log2(CPM + 0.5) instead of log2(RPKM + 1) for genes/exons or log2(RP10M + 1) for jxns.
Can be visualized as a scatterplot of expression versus time split by the age linear spline cutoffs (age in years: 0, 1, 10, 20 and 50). Age below 0 (prenatal) can be transformed from years to the PCW scale using the function round(range(age) * 52 + 40, 0)
. R colors:
- DLPFC:
'dark orange'
- HIPPO:
'skyblue3'
396584 BrainSeqPhaseII_clean_expression_development_exon.txt
24653 BrainSeqPhaseII_clean_expression_development_gene.txt
297182 BrainSeqPhaseII_clean_expression_development_jxn.txt
92733 BrainSeqPhaseII_clean_expression_development_tx.txt
Note that the analysis was done using limma-voom with log2(CPM + 0.5) instead of log2(RPKM + 1) for genes/exons or log2(RP10M + 1) for jxns.
Can be visualized as 2 boxplots, one for SCZD cases and one for controls. R colors:
- SCZD case:
'aquamarine4'
- control:
'orchid4'
396584 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_exon.txt
24653 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_gene.txt
297182 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_jxn.txt
92733 BrainSeqPhaseII_clean_expression_sczd_casecontrol_dlpfc_tx.txt
396584 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_exon.txt
24653 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_gene.txt
297182 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_jxn.txt
92733 BrainSeqPhaseII_clean_expression_sczd_casecontrol_hippo_tx.txt
Note that the analysis was done using limma-voom with log2(CPM + 0.5) instead of log2(RPKM + 1) for genes/exons or log2(RP10M + 1) for jxns.
Can be visualized as 4 boxplots, one for SCZD cases and one for controls for each brain region. R colors:
- SCZD case:
'aquamarine4'
(box) - control:
'orchid4'
(box) - DLPFC:
'dark orange'
(dots) - HIPPO:
'skyblue3'
(dots)
396584 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_exon.txt
24653 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_gene.txt
297182 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_jxn.txt
92733 BrainSeqPhaseII_clean_expression_sczd_casecontrol_interaction_tx.txt
Note that the analysis was done using limma-voom with log2(CPM + 0.5) instead of log2(RPKM + 1) for genes/exons or log2(RP10M + 1) for jxns.
The DLFPC_HIPPO_correlation_*
values from the sample metadata table (remember that they are only listed for the DLPFC
samples, although the values for the HIPPO
ones would be identical for the 265 subjects used in this analysis) can be visualized as 2 boxplots, one for SCZD cases and one for controls. R colors:
- SCZD case:
'aquamarine4'
- control:
'orchid4'
Sample visualization (although this one doesn't have the right colors)