-
Notifications
You must be signed in to change notification settings - Fork 6
How to avoid common mistakes
Through helping others run BHR, we've noticed a few common missteps. We hope this page will help you avoid them!
Misstep #1: Forgetting to run a synonymous variant negative control analysis
Synonymous variants are generally assumed to have no functional consequence, and thus should not contribute to burden heritability. As such, it is an excellent idea to run BHR using a synonymous variant mask and verify that synonymous burden heritability is not different from 0. If synonymous burden heritability is significantly different from zero, this might reflect at least one of a number of common issues, including: within-gene linkage disequilibrium arising from analyzing common variants (see below); variant misannotation; insufficient sample size.
Misstep #2: Forgetting to stratify missense variants by functional effect
Burden heritability is attenuated relative to the total heritability when a burden mask includes variants of heterogeneous effects. For some classes of variation, this is unlikely to be meaningful: for example, most loss-of-function variants within a given gene have a similar phenotypic effect. For other classes, this likely matters: for example, some missense variants have no functional effect, while some are as functionally impactful as loss-of-function variants. Thus, it is important to functionally annotate missense variants and run BHR separately for each functional category to avoid heritability attenuation. As optimal functional annotation of missense variants is an open question, there is no consensus annotation or number of missense functional categories to run through BHR; we used functional annotations from PolyPhen2 and analyzed ‘benign’ vs ‘possible damaging’/‘probably damaging’ (see manuscript).
Misstep #3: Analyzing common variants with BHR
Burden heritability regression leverages correlation in effects between nearby variants, a phenomenon indistinguishable from correlation induced by linkage disequilibrium. As LD increases with variant frequency, inflation in burden heritability estimates also becomes a concern with analysis of variants at higher allele frequencies. We suggest restricting analysis to variants at an allele frequency less than 0.1%. Additionally, we note that correcting for LD is theoretically straightforward, but computationally challenging (see Methods section of manuscript), and so is a potential development area for BHR in the future.
Misstep #4: Jointly analyzing variants of a wide range of allele frequencies
Running BHR on variants of a wide range of allele frequencies can cause attenuated burden heritability estimates due to effect-size - frequency dependent architecture. As such, for an individual BHR run we recommend stratifying variants by allele frequency; we used an arbitrary cut off of an order of magnitude of allele frequency (e.g, 1e-6 < AF < 1e-5; 1e-5 < AF < 1e-4). If you are interested in the total burden heritability from a range of variant frequencies and functional classes (encouraged!), see documentation about Aggregate mode.
Misstep #5: Analyzing underpowered binary phenotypes
BHR uses a mixed effects model in which significant genes are treated as fixed effects. With default settings, BHR identifies significant genes using a chi-squared test, and for binary traits with a small number of cases, this test produces an excess of false positives. These false-positive associations lead to inflated burden heritability estimates. You will be able to tell that this is a problem because it will cause BHR to produce non-zero heritability estimates for synonymous variants. This can be handled in a number of ways: 1) omit under-powered binary phenotypes from BHR analysis, or 2) assume for under-powered phenotypes there are no significant genes (set fixed_genes = NULL), or 3) identify significant genes outside of BHR using a tool like SAIGE or REGENIE and manually specify those significant genes in BHR analysis (using the fixed_genes argument).
Misstep #6: Jointly analyzing variants from different sequencing platforms
Combining association statistics across multiple cohorts is a common analytic goal. If the cohorts were sequenced on different platforms, genome coverage may vary across cohorts which can lead to inflation in BHR heritability estimates if phenotype varies with sequencing depth. To overcome this potential source of bias, we recommend running BHR separately on samples from each sequencing platform, and then meta-analyzing respective heritability estimates (this can be done using the BHR aggregate mode, substituting platform in place of trait).
Misstep #7: Using logistic regression association statistics in BHR
BHR inputs variant effect sizes from linear regression, not logistic regression. While logistic regression is often the tool of choice for binary traits, it is possible to calculate linear regression association statistics using case and control counts (see equation 34 in manuscript).
Misstep #8: Worrying alone
If you are unsure if you are running BHR correctly, please reach out by opening a new item in the GitHub issues section. We are happy to help!