Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore specified contigs for pre-aligned BAMs in PathSeq #6537

Merged
merged 3 commits into from
May 4, 2020

Conversation

mwalker174
Copy link
Contributor

Adds an argument to PathSeq filtering that lets users specify any contigs that should be ignored. This is useful for BAMs aligned to hg38, which contains the Epstein-Barr virus (chrEBV).

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I want one of the error messages clarified, but it's good to merge after that.

final Set<String> contigsToIgnoreSet = new HashSet<>(filterArgs.alignmentContigsToIgnore);
for (final String contig : contigsToIgnoreSet) {
if (dictionary.getSequence(contig) == null) {
throw new UserException.MissingContigInSequenceDictionary(contig, dictionary);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like for the error to specify somehow is that the problem is the contigs specified for filtering. Otherwise users will be validating all their input files and scratching their heads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -190,6 +193,16 @@
optional = true)
public int minIdentity = 30;

/**
* If using --is-host-aligned, ignores alignments to these contigs (can be specified multiple times). This
* can be useful if the alignment is to a reference containing a microbe, such as chrEBV in hg38.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really called chrEBV? Honestly I haven't looked at all 3500 names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I recently found out it's also in hg19 but it uses the NCBI accession

readA.setAttribute("NM", 0);
readA.setPosition(IGNORED_CONTIG, 1);
final boolean testA = filter.test(readA);
Assert.assertEquals(true, testA);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer Assert.assertTrue and Assert.assertFalse (below)

@mwalker174 mwalker174 merged commit 3a7ee7e into master May 4, 2020
@mwalker174 mwalker174 deleted the mw_host_filter branch May 4, 2020 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants