docs for paired reads in Mutect2 somatic genotyping #6264

davidbenjamin · 2019-11-15T15:27:55Z

@takutosato I should have done this earlier.

@droazen This is as requested by our friends at Illumina.

takutosato

This is great!

takutosato · 2019-11-15T15:54:44Z

docs/mutect/mutect.tex

+\subsection{Handling Paired Reads}
+The fundamental unit of evidence in paired-end sequencing is not the read but the fragment of DNA from which two reads were sequenced.  If we apply the somatic likelihoods model below directly to the read-vs-haplotype likelihoods matrix from Pair-HMM our model would in effect allow a read and its mate to come from two different biological haplotypes.  To prevent this absurdity, we transform the read-vs-haplotype likelihoods matrix into a fragment-vs-haplotype likelihood matrix.  The simplest approach, which \code{Mutect2} uses, is to multiply the likelihoods of all paired reads in a fragment (in log space, add) to obtain the fragment's likelihood.  That is, $P({\rm fragment} | {\rm haplotype}) = P({\rm read~1} | {\rm haplotype}) P({\rm read~2} | {\rm haplotype})$.  Multiplying likelihoods is justified as far as sequencing error is concerned, since sequencing errors on paired reads are statistically independent.  There are, of course shared covariates such as sequence context that influence errors on both reads.  These, however, are the domain of \code{FilterMutectCalls}'s downstream filtering.  The somatic likelihoods model of \code{Mutect2} is \textit{only} concerned with distinguishing sequencing errors (in the narrow sense of an error that occurred on the sequencer itself) from possible somatic variants.  \code{FilterMutectCalls} is responsible for distinguishing somatic variants from all other errors, such as those that occur during sample preparation and alignment.
+
+Simply multiplying likelihoods in this way is not justified when paired reads overlap because while sequencing errors are independent, PCR errors are not.  That is, a substitution occurring on both reads may be due to independent sequencing errors or to the amplification of a single PCR error.  In order to force the possibility of PCR error into a model of independent reads, we enforce that the effective, multiplied, likelihood must not exceed a bound given by the probability of PCR error.  To achieve this, at bases where read pairs overlap, \code{Mutect2} caps the total (that is, the sum, because qualities are measured in a logarithmic phred scale) base and indel qualities to user-defined PCR SNV and indel qualities.  This can be adjusted for PCR-free protocols.  For example, if the PCR quality is 40 and two reads overlap with base qualities of 25 and 30, each base quality is replaced with $40/2 = 20$.  If the base qualities were both, say, 15, no adjustment would be needed because the probability of sequencing error dominates the probability of PCR error.  This caping of overlapping base qualities occurs before Pair-HMM, while merging reads into fragments occurs after Pair-HMM.


This caping -> This capping

Hanging comma in "There are, of course..."

docs for paired reads

391236c

davidbenjamin added Mutect Documentation labels Nov 15, 2019

davidbenjamin requested a review from takutosato November 15, 2019 15:27

davidbenjamin assigned takutosato Nov 15, 2019

takutosato approved these changes Nov 15, 2019

View reviewed changes

edits

b55d10b

davidbenjamin merged commit dc948e6 into master Nov 18, 2019

davidbenjamin deleted the db_m2_docs branch November 18, 2019 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs for paired reads in Mutect2 somatic genotyping #6264

docs for paired reads in Mutect2 somatic genotyping #6264

davidbenjamin commented Nov 15, 2019

takutosato left a comment

takutosato Nov 15, 2019

takutosato Nov 15, 2019

docs for paired reads in Mutect2 somatic genotyping #6264

docs for paired reads in Mutect2 somatic genotyping #6264

Conversation

davidbenjamin commented Nov 15, 2019

takutosato left a comment

Choose a reason for hiding this comment

takutosato Nov 15, 2019

Choose a reason for hiding this comment

takutosato Nov 15, 2019

Choose a reason for hiding this comment