Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs for paired reads in Mutect2 somatic genotyping #6264

Merged
merged 2 commits into from
Nov 18, 2019
Merged

Conversation

davidbenjamin
Copy link
Contributor

@takutosato I should have done this earlier.

@droazen This is as requested by our friends at Illumina.

Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

\subsection{Handling Paired Reads}
The fundamental unit of evidence in paired-end sequencing is not the read but the fragment of DNA from which two reads were sequenced. If we apply the somatic likelihoods model below directly to the read-vs-haplotype likelihoods matrix from Pair-HMM our model would in effect allow a read and its mate to come from two different biological haplotypes. To prevent this absurdity, we transform the read-vs-haplotype likelihoods matrix into a fragment-vs-haplotype likelihood matrix. The simplest approach, which \code{Mutect2} uses, is to multiply the likelihoods of all paired reads in a fragment (in log space, add) to obtain the fragment's likelihood. That is, $P({\rm fragment} | {\rm haplotype}) = P({\rm read~1} | {\rm haplotype}) P({\rm read~2} | {\rm haplotype})$. Multiplying likelihoods is justified as far as sequencing error is concerned, since sequencing errors on paired reads are statistically independent. There are, of course shared covariates such as sequence context that influence errors on both reads. These, however, are the domain of \code{FilterMutectCalls}'s downstream filtering. The somatic likelihoods model of \code{Mutect2} is \textit{only} concerned with distinguishing sequencing errors (in the narrow sense of an error that occurred on the sequencer itself) from possible somatic variants. \code{FilterMutectCalls} is responsible for distinguishing somatic variants from all other errors, such as those that occur during sample preparation and alignment.

Simply multiplying likelihoods in this way is not justified when paired reads overlap because while sequencing errors are independent, PCR errors are not. That is, a substitution occurring on both reads may be due to independent sequencing errors or to the amplification of a single PCR error. In order to force the possibility of PCR error into a model of independent reads, we enforce that the effective, multiplied, likelihood must not exceed a bound given by the probability of PCR error. To achieve this, at bases where read pairs overlap, \code{Mutect2} caps the total (that is, the sum, because qualities are measured in a logarithmic phred scale) base and indel qualities to user-defined PCR SNV and indel qualities. This can be adjusted for PCR-free protocols. For example, if the PCR quality is 40 and two reads overlap with base qualities of 25 and 30, each base quality is replaced with $40/2 = 20$. If the base qualities were both, say, 15, no adjustment would be needed because the probability of sequencing error dominates the probability of PCR error. This caping of overlapping base qualities occurs before Pair-HMM, while merging reads into fragments occurs after Pair-HMM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caping -> This capping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hanging comma in "There are, of course..."

@davidbenjamin davidbenjamin merged commit dc948e6 into master Nov 18, 2019
@davidbenjamin davidbenjamin deleted the db_m2_docs branch November 18, 2019 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants