-
Notifications
You must be signed in to change notification settings - Fork 6
Running BHR
BHR
can be used in different modes, depending on desired outputs. The three primary modes are:
- Univariate: estimate heritability and genetic architecture for a single phenotype
- Bivariate: estimate cross trait genetic correlation and genetic architecture
- Aggregate: estimate the mean or summed burden heritability across traits or frequency-function groups
Important note about variant filtering by frequency and functional consequence
Variants profiled through exome sequencing span functional consequence (e.g., predicted loss-of-function, missense, synonymous) and orders of magnitude of allele frequency. Failure to account for these variant features can lead to biased inference. As such, when running BHR
, we recommend partitioning variants into "frequency-function" bins, where an individual frequency-function bin is defined by a frequency (e.g., AF < 1e-5) and a function (predicted loss-of-function). We recommend this approach for multiple reasons:
- Improved interpretability
- Attenuation bias when analyzing a wide allele frequency range together, due to effect-size - frequency dependent architecture
- Attenuation bias from jointly analyzing variants with different effect sizes (e.g., predicted loss-of-function with synonymous).
In the manuscript, we define frequency/function bins by order of AF magnutide (e.g., 1e-5 < AF < 1e-4) and PolyPhen2 predicted consequence.
If you are interested in a "quick and dirty" first analysis without needing to run BHR for many bins in parallel, you may consider running BHR for only singleton pLoF mutations.
Sample command:
BHR(mode = "univariate",
trait1_sumstats = sumstats,
annotations = list(baseline, annotation_1))
Required flags:
-
mode
: For univariate analysis, select "univariate" -
trait1_sumstats
: The variant-level summary statistics file described above, filtered to the phenotype of interest -
annotations
: A list of gene annotations, including the baseline file (required), and any additional gene set annotations. Note that these must be wrapped in list(), even if only using the single baseline annotation.
The output of univariate
is an R object. Of interest to most users will be:
-
output_name$mixed_model$heritabilities
: Reports the burden h2 and burden h2 standard error for each annotation and total heritability -
output_name$mixed_model$enrichments
: Reports the burden h2 enrichment and burden h2 enrichment standard error for each annotation -
output_name$significant_genes
: Various descriptive statistics and parameter estimates for significant genes -
output_name$qc
: Various QC metrics, including the intercept, attenuation ratio, and lambda GC.
Sample command:
BHR(mode = "bivariate",
trait1_sumstats = sumstats_1,
trait2_sumstats = sumstats_2
annotations = list(baseline, annotation_1))
Required flags:
-
mode
: For bivariate analysis, select "bivariate" -
trait1_sumstats
: The variant-level summary statistics file described above, filtered to phenotype 1 -
trait2_sumstats
: The variant-level summary statistics file described above, filtered to phenotype 2 -
annotations
: A list of gene annotations, including the baseline file (required), and any additional gene set annotations
The output of bivariate
is an R object. Of interest to most users will be:
-
output_name$rg$rg_mixed
andoutput_name$rg$rg_mixed_se
: Reports the burden genetic correlation and burden genetic correlation standard error for the trait pair -
output_name$rg$rho_mixed
andoutput_name$rg$rho_mixed_se
: Reports the burden genetic covariance and burden h2 genetic covariance standard error for the trait pair
In many cases, one may want to perform a BHR analysis that estimates the total or mean burden heritability across multiple sets of summary statistics.
There are three typical use cases for aggregate
:
Case 1) You are interested in the total burden heritability for a single trait across multiple frequency-function strata. For example, you might want to sum the ultra-rare pLoF and ultra-rare damaging missense burden heritability for height.
Case 2) You are interested in the mean burden heritability for multiple traits within a single frequency-function strata (e.g., mean burden heritability for height and RBC count from ultra-rare pLoF).
Case 3) You are interested in the total burden heritability for multiple traits across multiple frequency-function strata, where you want to first calculate the mean burden heritability across multiple traits within each of multiple frequency-function strata (case 2), then you want to sum that mean across the frequency-function strata (case 1).
In these cases, BHR should be run in "aggregate" mode. This function requires one summary statistics file for each frequency-function strata, each containing combined summary statistics across traits. For example, if you have two traits and two frequency function bins (e.g., height and RBC across ultra-rare pLoF and damaging missense) and you are interested in calculating step 3 above, you need two summary statistic files: one for ultra-rare pLoF with variant-level statistics for each trait, and one for ultra-rare missense).
Note that running in aggregate
is equivalent to summing/taking the mean across conditions in univariate
mode, except aggregate
calculates the appropriate standard error.
Note that aggregate
does not currently run the same suite of QC checks run by univariate
. We recommend that users first run each set of summary statistics through univariate
first to ensure there are no QC failures.
Sample command for case 1 above:
BHR(mode = "aggregate",
ss_list = list(sumstats_ultrarare_pLoF,sumstats_ultrarare_missense),
trait_list = list("height"),
annotations = list(baseline, annotation_1))
Sample command for case 2 above:
BHR(mode = "aggregate",
ss_list = list(sumstats_ultrarare_pLoF),
trait_list = list("height","RBC"),
annotations = list(baseline, annotation_1))
Sample command for case 3 above:
BHR(mode = "aggregate",
ss_list = list(sumstats_ultrarare_pLoF,sumstats_ultrarare_missense),
trait_list = list("height","RBC"),
annotations = list(baseline, annotation_1))
Required flags:
-
mode
: For aggregate analysis, select "aggregate" -
ss_list
: The list of summary statistics files, described above -
trait_list
: A list of traits to be meta-analyzed, corresponding to values ofphenotype_key
in the summary statistics. -
annotations
: A list of gene annotations, including the baseline file (required), and any additional gene set annotations
The output of aggregate
is an R object. Of interest to most users will be:
-
output_name$aggregated_mixed_model_h2
: Meta-analyzed heritability across traits intrait_list
, aggregated across burden masks -
output_name$aggregated_mixed_model_h2se
: Standard error of meta-analyzed heritability across traits intrait_list
, aggregated across burden masks
There are also many optional flags that can be included in the BHR analysis, that may be of interest to some users:
-
num_blocks
: The number of equally sized blocks of genes used to compute jackknife SE estimates. (Default:100) -
genomewide_correction
: Whether to condition on the genome-wide burden in model. See "Minor-allele biased population stratification" in the methods section of the BHR paper. In practice, you may consider setting this flag to TRUE if you observe non-zero burden heritability for synonymous variants. (Default: FALSE) -
gwc_exclusion
: Additional flag allowing user to specify if any annotations should be excluded from calculation of the genome-wide correction. Format is a list of column names from the baseline model or annotation files. For example, if you suspect that the bulk of highly constrained genes are causal for your phenotype, you may consider excluding "baseline_oe1_total5". (Default: NULL). -
fixed_genes
: BHR models significant genes as fixed effects, and identifies those significant genes via a simple chi-squared test (see "Large-effect genes as fixed effects" in the methods section of the BHR paper). If you have identified significant genes via a more sophisticated approach (for example, using linear mixed models in SAIGE-GENE+), you can input a list of genes here, which BHR will then use for fixed-effects instead of performing the chi-squared test. (Default: NULL) -
output_jackknife_h2
: Output leave-block-out jackknife heritability estimates. (Default: FALSE) -
output_jackknife_rg
: Output leave-block-out jackknife covariance and correlation estimates. (Default: FALSE) -
overdispersion
: Include a term in the model that fits an overdispersion term. See Equation 10 in the BHR paper. In practice, overdispersion heritability inference in BHR is much less powerful than burden heritability inference, and modelling overdispersion can cause issues due to approximate colinearity with the intercept, so we do not model overdispersion by default, meaning that the intercept by default reflects both confounding and overdispersion (Default: FALSE) -
all_models
: If TRUE, outputs results using only random-effects model (e.g. excluding significant/fixed genes), in addition to mixed model. (Default: FALSE) -
slope_correction
: A user-specified correction to be made to the inferred regression coefficient. We made use of this flag to correct for LD, see "Accounting for LD" in the methods section of the BHR paper. -
num_null_conditions
: BHR uses null moment conditions to effectively fix the intercept in such a way that very little true burden heritability signal leaks into the intercept, see "Independence assumption and selection-related bias" in methods section of the BHR paper. We found that 5 sets of null moment conditions works well (see Supplementary Figure 8 in BHR paper). Practically, you may consider increasing this number if you suspect that there is leakage of true burden heritability signal into intercept (e.g. if you observe a much higher intercept in LoF versus synonymous) (Default: 5) -
custom_weights
: By default, BHR computes burden heritability with the simplest burden definition: the count of minor alleles (e.g. uniform weights). However, there are many possible choices of burden weights, and the best choice depends on the question of interest. See "Alternative burden definitions" in methods section of BHR paper. If this flag is TRUE, then BHR will expect to find a column in the summary statistics with name "custom_weights", that will be used to calculate burdens. (Default: FALSE) -
null_stats
: If TRUE, BHR will randomly sign flip effect sizes to effectively eliminate burden heritability signal. This can be useful as a negative-control analysis, as there should be no burden heritability when using null burden statistics. (Default: FALSE) -
intercept
: If FALSE, BHR will not fit an intercept. In the vast majority of cases, this is undesirable and will lead to inflated burden heritability estimates. (Default: TRUE) -
custom_variant_variances
: In most cases, users will specify AF, and this will used to calculate variant variances. However, in some cases, users may not have access to allele frequencies, and will need to compute variance directly from other available parameters. If this flag is TRUE, BHR will expect a column in the summary statistics, "variant_variances", and will use these estimates to calculate burden weights and statistics. (Default: FALSE)