Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug in hardClipCigar function that caused incorrect cigar calculation #6280

Merged
merged 9 commits into from
Dec 2, 2019

Conversation

yfarjoun
Copy link
Contributor

Problem was ocurring in the presence of insertions and deletions. fixes #6139

  1. Changed ReadClipper unit tests:
  • The tests in many cases assumed that the unclipped alignment locations do not change when the read is clipped. This is not true: for example if start for cigar
    1M1I3M is 100, the unclipped start for 2H3M is 99. All assertUnclipped calls were
    removed
  • Alignment now check that the read length remains to be consistent with the CIGAR, that the aligned bases span are consistent with the CIGAR and that the number of clipped bases from the read is consistent with the requested clipping
  1. Hard clipping in ClippingOps was buggy, thus we introduced new tests for it.
  2. Text was refactored for readability
  3. Clipping in ClippingOps did not treat insertions and deletions in the clipped parts of the CIGAR correctly. This was fixed
  4. Alignment re-calculation after clipping did not work correctly if the initial CIGAR contained insertions and deletions
  5. Hard clipping applied to the hard clipped read did not behave correctly

@gatk-bot
Copy link

gatk-bot commented Nov 24, 2019

Travis reported job failures from build 28153
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28153.13 logs
unit openjdk8 28153.3 logs

@gatk-bot
Copy link

gatk-bot commented Nov 24, 2019

Travis reported job failures from build 28155
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28155.13 logs
unit openjdk8 28155.3 logs

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assorted complaints. The fix looks good and the tests are compelling.

readCopied.setBaseQualities(newQuals);
}

private void applyWriteNs(final GATKRead readCopied) {
final byte[] newBases = readCopied.getBases(); //this makes a copy so we can modify in place
overwriteFromStartToStop(newBases, (byte)'N');
overwriteFromStartToStop(newBases, (byte) 'N');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would err on the side of rigor and use Nucleotide.N.encodeAsByte() here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a space, and you want me to fix the old code!?? :-)

if (curReadCounter + curReadLength > readBasesBeforeReference) {
curReadLength = readBasesBeforeReference - curReadCounter;
curRefLength = readBasesBeforeReference - curReadCounter;
truncated = true;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the definition of truncated more explicit:

final boolean truncated = curReadCounter + curReadLength > readBasesBeforeReference;
if (truncated) {
  curReadLength = readBasesBeforeReference - curReadCounter;
  curRefLength = readBasesBeforeReference - curReadCounter;
}

if (runAsserts) {
assert newCigar.isValid(null, -1) == null;
}
assert !runAsserts || newCigar.isValid(null, -1) == null;
Copy link
Contributor

@davidbenjamin davidbenjamin Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserts in non-test code are not very GATK. Is this really not covered in unit tests? You can defer this since it's tangential to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted to keep the history cleaner

@@ -548,7 +545,8 @@ private CigarShift cleanHardClippedCigar(final Cigar cigar) {
} else if (!readHasStarted && cigarElement.getOperator() == CigarOperator.HARD_CLIP) {
totalHardClip += cigarElement.getLength();
} else if (!readHasStarted && cigarElement.getOperator() == CigarOperator.DELETION) {
totalHardClip += cigarElement.getLength();
//totalHardClip += cigarElement.getLength();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you are putting in an effect-less line of code to document that the total is not incremented. Is that because there are a bunch of else ifs and you don't want to give the impression that it's always incremented? How about this:

if (!readHasStarted && (cigarElement.getOperator() == CigarOperator.HARD_CLIP || cigarElement.getOperator() == CigarOperator.CigarOperator.SKIPPED_REGION) {
  totalHardClip += cigarElement.getLength();
}
readHasStarted |= cigarElement.getOperator() != CigarOperator.DELETION &&
                        cigarElement.getOperator() != CigarOperator.SKIPPED_REGION &&
                        cigarElement.getOperator() != CigarOperator.HARD_CLIP;

Also, in the (unmodified) code below it seems that the if (ReadHasStartedBlock) always results in addedHardClips == true. If you agree, could you move the addedHardCips = true statements to the end of that block?

size += elements.get(i).getLength();
i++;
}

return size;
}

/**
* Calculates shift of alignment int the newCigar relative to the oldCigar due to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> in

byte [] bases = Utils.dupBytes(base, length);
byte [] quals = Utils.dupBytes(qual, length);
final int length = cigar.getReadLength();
final Random random = new Random(TestUtil.RANDOM_SEED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Utils.getRandomGenerator() or some such Utils method?

@@ -87,19 +97,24 @@ public void testHardClipByReferenceCoordinates() {
for (Cigar cigar : cigarList) {
GATKRead read = ReadClipperTestUtils.makeReadFromCigar(cigar);
int start = getSoftStart(read);
int aln_start = read.getStart();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camel case

private void assertReadClippingConsistent(final GATKRead original,
final GATKRead clipped,
final int clipping) {
int clip_diff = original.getLength() - clipped.getLength();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camel case

Assert.assertTrue(clipped.isUnmapped());
return;
}
int delta_left = clipped.getStart() - original.getStart();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camels

Assert.assertEquals(delta_left, clipped_left);
int original_end = original.getEnd() ;
int clipped_end = clipped.getEnd() ;
int delta_right = original_end - clipped_end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camels

@droazen droazen removed their request for review November 25, 2019 19:12
Yossi Farjoun added 3 commits November 26, 2019 16:39
…lculation

1. Changed ReadClipper unit tests:
  - The tests in many cases assumed that the unclipped alignment locations do not    change when the read is clipped. This is not true: for example if start for cigar
   1M1I3M is 100, the unclipped start for 2H3M is 99. All assertUnclipped calls were
   removed
   - Alignment now check that the read length remains to be consistent with the CIGAR, that the aligned bases span are consistent with the CIGAR and that the number of clipped bases from the read is consistent with the requested clipping
2. Hard clipping in ClippingOps was buggy, thus we introduced new tests for it.
3. Text was refactored for readability
4. Clipping in ClippingOps did not treat insertions and deletions in the clipped parts of the CIGAR correctly. This was fixed
5. Alignment re-calculation after clipping did not work correctly if the initial CIGAR contained insertions and deletions
6. Hard clipping applied to the hard clipped read did not behave correctly
@yfarjoun yfarjoun force-pushed the yf_fix_bug_in_hardclip branch from 780f67b to 3569d07 Compare November 26, 2019 14:40
@gatk-bot
Copy link

gatk-bot commented Nov 26, 2019

Travis reported job failures from build 28178
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28178.13 logs
unit openjdk8 28178.3 logs

@yfarjoun
Copy link
Contributor Author

builds are now failing due to the changes in the artifical reads generator...I'll make them more flexible so that I can still get more random reads without breaking the other tests....

@gatk-bot
Copy link

gatk-bot commented Nov 27, 2019

Travis reported job failures from build 28207
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28207.13 logs
unit openjdk8 28207.3 logs

@gatk-bot
Copy link

gatk-bot commented Nov 27, 2019

Travis reported job failures from build 28205
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28205.13 logs
unit openjdk8 28205.3 logs

@gatk-bot
Copy link

gatk-bot commented Nov 27, 2019

Travis reported job failures from build 28209
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28209.13 logs
unit openjdk8 28209.3 logs

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Dec 1, 2019

back to you @davidbenjamin The tests are also passing this time!

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ready to merge.

@yfarjoun yfarjoun merged commit 0a4cca8 into master Dec 2, 2019
@yfarjoun yfarjoun deleted the yf_fix_bug_in_hardclip branch December 2, 2019 07:57
@yfarjoun
Copy link
Contributor Author

yfarjoun commented Dec 2, 2019

Thanks for the review @davidbenjamin 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReadClipper::hardClipByReadCoordinates incorrectly clips cigars in the presence of indels
3 participants