-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed a bug in hardClipCigar function that caused incorrect cigar calculation #6280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assorted complaints. The fix looks good and the tests are compelling.
readCopied.setBaseQualities(newQuals); | ||
} | ||
|
||
private void applyWriteNs(final GATKRead readCopied) { | ||
final byte[] newBases = readCopied.getBases(); //this makes a copy so we can modify in place | ||
overwriteFromStartToStop(newBases, (byte)'N'); | ||
overwriteFromStartToStop(newBases, (byte) 'N'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would err on the side of rigor and use Nucleotide.N.encodeAsByte()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a space, and you want me to fix the old code!?? :-)
if (curReadCounter + curReadLength > readBasesBeforeReference) { | ||
curReadLength = readBasesBeforeReference - curReadCounter; | ||
curRefLength = readBasesBeforeReference - curReadCounter; | ||
truncated = true; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the definition of truncated more explicit:
final boolean truncated = curReadCounter + curReadLength > readBasesBeforeReference;
if (truncated) {
curReadLength = readBasesBeforeReference - curReadCounter;
curRefLength = readBasesBeforeReference - curReadCounter;
}
if (runAsserts) { | ||
assert newCigar.isValid(null, -1) == null; | ||
} | ||
assert !runAsserts || newCigar.isValid(null, -1) == null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts in non-test code are not very GATK. Is this really not covered in unit tests? You can defer this since it's tangential to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted to keep the history cleaner
@@ -548,7 +545,8 @@ private CigarShift cleanHardClippedCigar(final Cigar cigar) { | |||
} else if (!readHasStarted && cigarElement.getOperator() == CigarOperator.HARD_CLIP) { | |||
totalHardClip += cigarElement.getLength(); | |||
} else if (!readHasStarted && cigarElement.getOperator() == CigarOperator.DELETION) { | |||
totalHardClip += cigarElement.getLength(); | |||
//totalHardClip += cigarElement.getLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you are putting in an effect-less line of code to document that the total is not incremented. Is that because there are a bunch of else if
s and you don't want to give the impression that it's always incremented? How about this:
if (!readHasStarted && (cigarElement.getOperator() == CigarOperator.HARD_CLIP || cigarElement.getOperator() == CigarOperator.CigarOperator.SKIPPED_REGION) {
totalHardClip += cigarElement.getLength();
}
readHasStarted |= cigarElement.getOperator() != CigarOperator.DELETION &&
cigarElement.getOperator() != CigarOperator.SKIPPED_REGION &&
cigarElement.getOperator() != CigarOperator.HARD_CLIP;
Also, in the (unmodified) code below it seems that the if (ReadHasStartedBlock)
always results in addedHardClips == true
. If you agree, could you move the addedHardCips = true
statements to the end of that block?
size += elements.get(i).getLength(); | ||
i++; | ||
} | ||
|
||
return size; | ||
} | ||
|
||
/** | ||
* Calculates shift of alignment int the newCigar relative to the oldCigar due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
-> in
byte [] bases = Utils.dupBytes(base, length); | ||
byte [] quals = Utils.dupBytes(qual, length); | ||
final int length = cigar.getReadLength(); | ||
final Random random = new Random(TestUtil.RANDOM_SEED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Utils.getRandomGenerator()
or some such Utils
method?
@@ -87,19 +97,24 @@ public void testHardClipByReferenceCoordinates() { | |||
for (Cigar cigar : cigarList) { | |||
GATKRead read = ReadClipperTestUtils.makeReadFromCigar(cigar); | |||
int start = getSoftStart(read); | |||
int aln_start = read.getStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camel case
private void assertReadClippingConsistent(final GATKRead original, | ||
final GATKRead clipped, | ||
final int clipping) { | ||
int clip_diff = original.getLength() - clipped.getLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camel case
Assert.assertTrue(clipped.isUnmapped()); | ||
return; | ||
} | ||
int delta_left = clipped.getStart() - original.getStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camels
Assert.assertEquals(delta_left, clipped_left); | ||
int original_end = original.getEnd() ; | ||
int clipped_end = clipped.getEnd() ; | ||
int delta_right = original_end - clipped_end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camels
…lculation 1. Changed ReadClipper unit tests: - The tests in many cases assumed that the unclipped alignment locations do not change when the read is clipped. This is not true: for example if start for cigar 1M1I3M is 100, the unclipped start for 2H3M is 99. All assertUnclipped calls were removed - Alignment now check that the read length remains to be consistent with the CIGAR, that the aligned bases span are consistent with the CIGAR and that the number of clipped bases from the read is consistent with the requested clipping 2. Hard clipping in ClippingOps was buggy, thus we introduced new tests for it. 3. Text was refactored for readability 4. Clipping in ClippingOps did not treat insertions and deletions in the clipped parts of the CIGAR correctly. This was fixed 5. Alignment re-calculation after clipping did not work correctly if the initial CIGAR contained insertions and deletions 6. Hard clipping applied to the hard clipped read did not behave correctly
780f67b
to
3569d07
Compare
- added tests - fixed bug that was found in the process
builds are now failing due to the changes in the artifical reads generator...I'll make them more flexible so that I can still get more random reads without breaking the other tests.... |
back to you @davidbenjamin The tests are also passing this time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ready to merge.
Thanks for the review @davidbenjamin 🎉 |
Problem was ocurring in the presence of insertions and deletions. fixes #6139
1M1I3M is 100, the unclipped start for 2H3M is 99. All assertUnclipped calls were
removed