Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed an edge case that could result in negative GQs when in USE_POSTERIOR_PROBABILITIES mode #7120

Merged
merged 1 commit into from
Jun 1, 2021

Conversation

jamesemery
Copy link
Collaborator

@jamesemery jamesemery commented Mar 2, 2021

Resolves one of the issues encountered when merging the DRAGEN-GATK gvcfs the #7108.

@droazen
Copy link
Contributor

droazen commented Mar 31, 2021

@ldgauthier Would you like to review this one as well, since you reviewed #7148? It's just some added Math.min() statements by the look of it...

@@ -340,22 +341,23 @@ public static void makeGenotypeCall(final int ploidy,
}
}

private static double getGQLog10FromPosteriors(final int bestGenotypeIndex, final double[] /**/log10Posteriors) {
@VisibleForTesting
static double getGQLog10FromPosteriors(final int bestGenotypeIndex, final double[] /**/log10Posteriors) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vruano James and I are both wondering why we're allowing a bestGenotypeIndex that isn't the array max.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just a optimization..... looking where this method is used, is often the case that that max index has already been calculated earlier on the very same probs/lks array for another reason and by taken it as an argument we are saving ourselves the recalculation cost.

Have you seen anywhere where this rule is broken (i.e. the index was calculated from a different array or the array was altered after the index was calculated?)

I can see that it is a code smell and so If you prefer to recalculate i'm ok with that. Perhaps we should consider to encapsulate prob/lk array in an object that would keep track of best and second best but that perhaps is over-kill for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any inconsistent cases, but can we at least specify in some javadoc that the index should agree with the minimum of the posterior array being passed in?

log10Posteriors[ bestGenotypeIndex == 0 ? 2 : bestGenotypeIndex - 1],
log10Posteriors[ bestGenotypeIndex == 2 ? 0 : bestGenotypeIndex + 1]));
default:
if (bestGenotypeIndex == 0) {
return MathUtils.log10SumLog10(log10Posteriors, 1, log10Posteriors.length);
return Math.min(0.0, MathUtils.log10SumLog10(log10Posteriors, 1, log10Posteriors.length));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue for @vruano : If we have to throw a min on this, then the normalization isn't happening in the right place. According to my calculations (and as seen in James's unit test) the common case where this comes into play is where there are more than two equally likely genotypes, so the log10 probabilities get normalized to [0, 0, 0, -something, more negative somethings], the best is a zero and then there are two not bests that are zero and the log10SumLog10(0,0) makes a log10(2) ~ 0.3 > 0 which leads to a negative GQ in Phred-space. Elsewhere we use the MathUtils.secondSmallestMinusSmallest, which doesn't have this problem. I don't think the log10SumLog10 is the right thing to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I don't know why I did it this way... we could do what we do with PL or perhaps even better would be to do the proper calculation that is sum (Z) of all the posteriors (before doing the -max trasformation) and report 1-Prob(GT)/Z

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the definition of GQ in the VCF we are supposed to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we output integer GQ values (as in the spec), the "smallest minus second smallest" will be the same as 1-Prob(GT)/Z unless the PLs are [0,0,0] right?

@jamesemery
Copy link
Collaborator Author

@ldgauthier @vruano Given that this is still fixing a rather glaring bug in the current behavior and we should probably get a fix of some ilk in before the next release of GATK I'm going to push the decision to rework the GT calculation to an open issue ticket in favor of getting this branch merged.

@ldgauthier
Copy link
Contributor

I have merged many a PR by opening new issues, so I would be a hypocrite to object. Sounds good to me @jamesemery.

@droazen
Copy link
Contributor

droazen commented May 17, 2021

@jamesemery What's the status of this one as well? Close to a merge?

@jamesemery jamesemery force-pushed the je_fixDragenGVCFNegativeGQ branch from 83b1c8d to 464a40d Compare May 19, 2021 13:56
@jamesemery
Copy link
Collaborator Author

@ldgauthier @vruano can i get a passing review on this so it can be merged? the failing tests were transient errors.

@jamesemery jamesemery dismissed ldgauthier’s stale review May 20, 2021 15:29

Pushed the primary complaint into an issue for farther discussion.

Copy link
Contributor

@vruano vruano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections

@vruano vruano assigned jamesemery and droazen and unassigned vruano, jamesemery and droazen May 31, 2021
@jamesemery jamesemery merged commit 944c128 into master Jun 1, 2021
@jamesemery jamesemery deleted the je_fixDragenGVCFNegativeGQ branch June 1, 2021 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants