-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
used constants; implemented non-AS transformation #7718
Conversation
src/main/java/org/broadinstitute/hellbender/tools/gvs/ingest/VetFieldEnum.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/gvs/ingest/VetFieldEnum.java
Outdated
Show resolved
Hide resolved
} else if (variant.getAlleles().size() == 4) { | ||
outNotAlleleSpecific = (int) mq / 3 + VCFConstants.PHASED + (int) mq / 3 + VCFConstants.PHASED + (int) mq / 3; | ||
outNotAlleleSpecific = (int) mq + VCFConstants.PHASED + (int) mq + VCFConstants.PHASED + (int) mq; | ||
} else { | ||
throw new UserException("Expected diploid sample to either have 3 alleles (ref, alt, non-ref) or 4 alleles (ref, alt 1, alt 2, non-ref)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have done this as an initial check and then a loop over the number of alleles rather than hard coding 3 and 4 in the conditionals, but maybe that's just six one way half a dozen the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah -- I don't disagree that this class could use some further refactoring
src/main/java/org/broadinstitute/hellbender/tools/gvs/ingest/VetFieldEnum.java
Show resolved
Hide resolved
@@ -121,6 +127,11 @@ public String getColumnValue(final VariantContext variant) { | |||
|
|||
AS_RAW_MQRankSum { // TODO -- maybe rely on 1/1 for call_GT, also get rid of the | at the beginning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the entirety of the code -- can this be subbed out with a constant also?
.alleles(Arrays.asList(Allele.ALT_A, Allele.ALT_T)) | ||
.make(); | ||
|
||
builderA.attribute(AS_RAW_MAP_QUAL_RANK_SUM_KEY,"|-0.2,1|-2.5,1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand where these values are coming from. There's only one sample, so there should only be one value, but that sample is heterozygous non-reference, so there should be no values since there's no ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just made up to ensure that we don't return them (ie we're just defining the input data here, say from HC where we sometimes see this for GT 1/2). The test below asserts this is not the value returned
.attribute(DEPTH_KEY,"8") | ||
.attribute(MAP_QUAL_RANK_SUM_KEY,"-0.572") | ||
.attribute(RAW_QUAL_APPROX_KEY,"154") | ||
.attribute(RAW_MAPPING_QUALITY_WITH_DEPTH_KEY,"5024,8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MQ=25 what we expect? Definitely unusual, so I'm just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand where 25 is coming from? If it helps, these are just wiring tests, so scientifically unusual values are fine (ie we're just testing the math/transformation of those values is correct)
builderA.attribute(AS_RAW_QUAL_APPROX_KEY,"|31|29|0") | ||
.attribute(AS_VARIANT_DEPTH_KEY,"1|6|3|0") | ||
.attribute(DEPTH_KEY,"18") | ||
.attribute(MAP_QUAL_RANK_SUM_KEY,"0.696") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two alts shouldn't have either rank sum, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look below we are asserting that there is no value returned. This is testing the case where it is in the VCF for some reason (as we've seen happens) we don't return it to VQSR
.genotypes(Arrays.asList(g)); | ||
|
||
|
||
VariantContext vc = builderA.make(); | ||
|
||
Assert.assertEquals(VetFieldEnum.ref.getColumnValue(vc), "C"); | ||
Assert.assertEquals(VetFieldEnum.alt.getColumnValue(vc), "CTTT,CTT"); | ||
Assert.assertEquals(VetFieldEnum.AS_RAW_MQ.getColumnValue(vc), "936|936|936"); | ||
Assert.assertEquals(VetFieldEnum.AS_RAW_MQRankSum.getColumnValue(vc), "0.696,1|0.696,1"); | ||
Assert.assertEquals(VetFieldEnum.AS_RAW_MQ.getColumnValue(vc), "2808|2808|2808"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to calculate MQ here? VarDP is just informative reads, which is going to inflate our numbers, but here I'm getting ~17, which can't be right.
Summary of VQSR Changes
- ONLY populate AS_RAW_MQRankSum or AS_RAW_ReadPosRankSum for ref-alt genotypes (0/1, 0/2) not 1/1/,1/2,2,2
- AS_RAW_MQ for Non AS... Assign full MQ to alternate allele (don't distribute)
- Compute SUM(AD) for future use
- provide command line option to force the AS-Approximate path even when AS annotations are available (for benchmarking/comparisons)