-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore specified contigs for pre-aligned BAMs in PathSeq #6537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I want one of the error messages clarified, but it's good to merge after that.
final Set<String> contigsToIgnoreSet = new HashSet<>(filterArgs.alignmentContigsToIgnore); | ||
for (final String contig : contigsToIgnoreSet) { | ||
if (dictionary.getSequence(contig) == null) { | ||
throw new UserException.MissingContigInSequenceDictionary(contig, dictionary); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like for the error to specify somehow is that the problem is the contigs specified for filtering. Otherwise users will be validating all their input files and scratching their heads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -190,6 +193,16 @@ | |||
optional = true) | |||
public int minIdentity = 30; | |||
|
|||
/** | |||
* If using --is-host-aligned, ignores alignments to these contigs (can be specified multiple times). This | |||
* can be useful if the alignment is to a reference containing a microbe, such as chrEBV in hg38. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really called chrEBV? Honestly I haven't looked at all 3500 names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I recently found out it's also in hg19 but it uses the NCBI accession
readA.setAttribute("NM", 0); | ||
readA.setPosition(IGNORED_CONTIG, 1); | ||
final boolean testA = filter.test(readA); | ||
Assert.assertEquals(true, testA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer Assert.assertTrue
and Assert.assertFalse
(below)
Adds an argument to PathSeq filtering that lets users specify any contigs that should be ignored. This is useful for BAMs aligned to hg38, which contains the Epstein-Barr virus (chrEBV).