-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AH - implement changes for mitochondrial pipeline #6399
Conversation
f3ba57b
to
959a4a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished reviewing first half of code.
...va/org/broadinstitute/hellbender/tools/walkers/annotator/allelespecific/StrandBiasUtils.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/annotator/allelespecific/StrandBiasUtils.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/annotator/allelespecific/StrandBiasUtils.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/annotator/allelespecific/StrandBiasUtils.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/mutect/clustering/SomaticClusteringModel.java
Outdated
Show resolved
Hide resolved
...n/java/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/ContaminationFilter.java
Outdated
Show resolved
Hide resolved
...n/java/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/ContaminationFilter.java
Outdated
Show resolved
Hide resolved
@@ -20,11 +21,11 @@ public DuplicatedAltReadFilter(final int uniqueAltReadCount) { | |||
public ErrorType errorType() { return ErrorType.ARTIFACT; } | |||
|
|||
@Override | |||
public boolean isArtifact(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) { | |||
return vc.getAttributeAsInt(UniqueAltReadCount.KEY, 1) <= uniqueAltReadCount; | |||
public List<Boolean> areAllelesArtifacts(final VariantContext vc, final Mutect2FilteringEngine filteringEngine, ReferenceContext referenceContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always returns a singleton list, even if there is more than one alt allele. . .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. So, UniqueAltReadCount is actually assuming a single alt allele. I guess I should change that class to write an allele specific annotation that can be used in an allele specific way in the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
trueProbability *= (1 - errorProb); | ||
// if vc has symbolic allele, remove it | ||
if (vc.hasSymbolicAlleles()) { | ||
// can we assume it's the last allele? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VCF spec doesn't require it, so let's be cautious and not assume.
...in/java/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/ErrorProbabilities.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you, @ahaessly. It's fundamentally sound.
...va/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/MinAlleleFractionFilter.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/MinAlleleFractionFilter.java
Outdated
Show resolved
Hide resolved
...va/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/MinAlleleFractionFilter.java
Outdated
Show resolved
Hide resolved
...ava/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/Mutect2FilteringEngine.java
Show resolved
Hide resolved
...ava/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/Mutect2FilteringEngine.java
Outdated
Show resolved
Hide resolved
...n/java/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/TumorEvidenceFilter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/variant/GATKVCFConstants.java
Outdated
Show resolved
Hide resolved
args -> { | ||
intervals.stream().map(SimpleInterval::new).forEach(args::addInterval); | ||
return args; | ||
}); | ||
|
||
// add tests for DUPLICATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the status of this comment?
...est/resources/org/broadinstitute/hellbender/tools/mutect/mito/expected_LowHetNone_output.txt
Outdated
Show resolved
Hide resolved
...nstitute/hellbender/tools/mutect/mito/expected_LowHetVariantWalkerIntegrationTest_output.txt
Outdated
Show resolved
Hide resolved
ecdbbe7
to
1946785
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the annotation aspects and left the Mutect2-specific parts to David. I pushed a commit with some changes I should have made a long time ago and a fix for the AFs in the multi-allelic splitting. Let me know what you think. (There are probably a lot more methods we could move from RefConfMerger to AlleleSubsettingUtils, but I just moved the ones I used here.)
@@ -0,0 +1,44 @@ | |||
package org.broadinstitute.hellbender.tools.walkers.annotator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence about putting this in the allelespecific subpackage. On the one hand, it is allele-specific, but on the other hand, all those classes have interfaces and/or parent classes that this one doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, now that I look at that class again, I think I'll add the AlleleSpecificAnnotation interface to it. but, yeah, I wasn't really sure which package it belonged in.
|
||
public static Map<String, Object> computeSBAnnotation(VariantContext vc, AlleleLikelihoods<GATKRead, Allele> likelihoods, String key) { | ||
// calculate the annotation from the likelihoods | ||
// likelihoods can come from HaplotypeCaller call to VariantAnnotatorEngine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, sometimes likelihoods come from Mutect2, right?
return vcb.make(); | ||
} | ||
|
||
/** | ||
* Creates a comma separated string of all the filters that apply to the allele. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really create a comma separated string? Also, it looks like the only duplicate filter you're checking for is PASS, so I would call the method "removeExtraPassFilters" or something less general than what's there now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - didn't update the doc after a refactor.
the call to distinct removes any duplicate filters (of which the only possibility should be PASS). the first if statement removes the PASS if there are filters that did not pass.
} | ||
|
||
@Override | ||
protected List<String> requiredAnnotations() { return Collections.emptyList(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does have required annotations, they're just not info annotations. How important is this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well the code that uses this sees if the Variant Context has the annotations - so this wouldn't work here because it's on the genotype. If there are no required annotations the check passes and the error probabilities are calculated. I could rename the method to requiredInfoAnnotations or requiredVariantAnnotations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the upstream method has a check for INFO annotations, then I'm happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - renaming the method to requiredInfoAnnotations so it's more explicit
...ava/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/StrictStrandBiasFilter.java
Show resolved
Hide resolved
...roadinstitute/hellbender/tools/walkers/mutect/filtering/MTLowHeteroplasmyFilterToolTest.java
Outdated
Show resolved
Hide resolved
@ahaessly Here's a representative line in the
Comments:
Otherwise, looks like you have addressed everything. |
@davidbenjamin AS_FilterStatus was already a field used in the INFO section of a VCF (@ldgauthier added it I believe). Also, it seems like there are several other INFO field attributes that use the _ if you take a look at GATKVCFConstants file. I'm fine changing it just wanted to hear from laura as well. |
I double-checked the VCF spec and it turns out that while FORMAT keys must be alphanumeric, INFO keys may have underscores. I still prefer |
@davidbenjamin I'm willing to concede number 3, and for number 2 I grudgingly agree. A dot in the FILTER field means filters haven't been applied, so by extension a dot in the ASFILTER field would mean that filters haven't been applied to that allele, which isn't true. But in the autosomal pipeline, a variant that fails a site filter before VQSR wouldn't be seen or filtered by VQSR, which I suppose does mean that filters have not been applied. I guess my objection boils down to the fact that by outputting . instead of PASS you're throwing away information. |
3f2937e
to
f07b809
Compare
I agree that we need to address that. What if an allele whose failing filters were all passed to the site got a concise tag like Besides the aesthetics of the word "PASS" being associated in any way with something failing, I'm worried about users shooting themselves in the foot with pipelines that only check the allele filters. You could imagine cases where failing to check the site filters wouldn't be so egregious as to be obvious. |
I could be onboard with |
It will get a header line. |
@ahaessly after reviewing the allele-specific filtering discussion in hts-specs (samtools/hts-specs#115), I think the delimiters for the ASFILTER values should be switched, i.e. data for each allele should be separated by commas ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WDLs look good to me (that's all I looked at). I learned some new tricks! Only have a couple of minor comments/questions. Thanks!
@@ -1,6 +1,6 @@ | |||
version 1.0 | |||
|
|||
import "AlignmentPipeline.wdl" as AlignAndMarkDuplicates | |||
import "https://api.firecloud.org/ga4gh/v1/tools/mitochondria:AlignmentPipeline/versions/1/plain-WDL/descriptor" as AlignAndMarkDuplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the tests to work this needs to be the file name "AlignmentPipeline.wdl". If there's a better solution for moving things back and forth between Terra and the gatk repo though I'm all for it.
awk '{print $6}' output-data > major_hg.txt | ||
awk '{print $8}' output-data > minor_hg.txt | ||
awk '{print $14}' output-data > mean_het_major.txt | ||
awk '{print $15}' output-data > mean_het_minor.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth adding any sanity checks here to make sure this doesn't break in the future since it's relying on column order to extract the values?
@@ -470,7 +498,7 @@ task M2 { | |||
--max-mnp-distance 0 | |||
>>> | |||
runtime { | |||
docker: "us.gcr.io/broad-gatk/gatk:4.1.1.0" | |||
docker: select_first([gatk_docker_override, "us.gcr.io/broad-gatk/gatk:4.1.1.0"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the solution here is, but does keeping this as the default version mean that we'll have to keep updating it when there are new releases? That's the way it already was, so it's not necessarily a problem, I just want to remember that once this PR is merged and released we should go back and update this default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one small question/comment but otherwise the WDL and WDL tests look good to me! Thanks for adding them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahaessly Looks like you have resolved everything. Don't bother sending it back to me on account my one trivial comment, which you can take or leave at your discretion.
// set tumorAD to 0 for symbolic alleles so it won't contribute to overall AD | ||
List<Allele> symbolicAlleles = vc.getAlternateAlleles().stream().filter(allele -> allele.isSymbolic()).collect(Collectors.toList()); | ||
// convert allele index to alt allele index | ||
List<Integer> symIndexes = vc.getAlleleIndices(symbolicAlleles).stream().map(i -> i-1).collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you go directly compute symbolic indices via new IndexRange(0, vc.getNAlleles()).filter(n -> vc.getAllele(n).isSymbolic());
?
@@ -73,7 +73,7 @@ | |||
|
|||
|
|||
/** | |||
* | |||
* Returns a list for each alternate allele which is the probability that the allele should be filtered out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a list with one element for each allele"?
new refactor adding allele specific filters in mutect2 fix headers * Fixed AF and SB splitting; also some javadoc I should have done in the past and some refactoring I should have done in the past * make unique alt read count allele specific * fix genotypes not included in vcf * changed getRequiredAnnotations to getRequiredInfoAnnotations to be more explicit * fix split multiallelics to work for all info fields * change count type for RPA * update example inputs to use broad public bucket for refs Co-authored-by: Laura Gauthier <[email protected]>
This PR converts the Mutect2Filtering engine to be allele specific. This required changes to SomaticClusteringModel and ThresholdCalculator as well as ErrorProbabilities and of course the filters themselves. There are some filters which have not yet been converted, but I am prioritizing the ones in this PR for Sarah Calvo and the mitochondria pipeline. This provides the implementation for dsp-spec-ops tickets 166, 168, 169