Downsample a bam by duplicate sets instead of reads #6512

takutosato · 2020-03-19T21:14:15Z

@lbergelson we've chatted about this about a month ago. I recall you wanted me to write a "ReadWalkerBase," which would be the parent class of ReadWalker and DuplicateSetWalker. I haven't done that just yet—in this code DuplicateSetWalker is a subclass of ReadWalker. I wanted to show you this code first before embarking on further refactoring.

gatk-bot · 2020-03-23T14:53:46Z

Travis reported job failures from build 29677
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
unit	openjdk11	29677.14	logs

droazen · 2020-04-01T17:44:28Z

@lbergelson Are you happy with this branch as-is or do you want further refactoring as @takutosato hinted at above?

takutosato · 2020-04-10T19:46:04Z

@droazen or @jamesemery, could one of you take a look at this PR?

lbergelson · 2020-04-13T15:58:53Z

I'm reviewing this, sorry for the slow response.

cmnbroad · 2020-04-13T16:05:49Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+ **/
+public class DownsampleByDuplicateSet extends DuplicateSetWalker {
+    @Argument(fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME, shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME, doc = "")
+    public File outputBam;


Suggested change

public File outputBam;

public GATKPathSpecifier outputBam;

We're trying to eliminate the use of File wherever possible, since its limiting, and rely the more general GATKPathSpecifier.

@cmnbroad should I avoid createTempFile() in test classes as well? It creates a File object.

@takutosato No - using createTempFile and File in tests is fine for now. Thanks.

Awesome, thanks!

cmnbroad · 2020-04-13T16:18:16Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+    public void onTraversalStart() {
+        super.onTraversalStart();
+        rng = RandomGeneratorFactory.createRandomGenerator(new Random(RANDOM_SEED));
+        outputWriter = createSAMWriter(IOUtils.getPath(outputBam.getAbsolutePath()), false);


If you use my GATKPathSpecifier suggestion above, you can just call .toPath() on the GATKPathSpecifier instance here.

Pre-sorted = false means this will resort the output I believe. Is that what you want?

.toPath -> done.
Hmmm. That's not what I want. But I don't think it sorts the output. Will investigate.

It probably doesn't sort it if the sort order is listed as "unsorted"

lbergelson

@takutosato Checkpointing my review so you can start taking a look at what I'm thinking. I still haven't looked at the tool itself DownsampleByDuplicateSet or the DuplicateSet code. I'll try to get to those tomorrow.

lbergelson · 2020-04-10T22:38:00Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+ * http://fulcrumgenomics.github.io/fgbio/tools/latest/GroupReadsByUmi.html
+ */
+public abstract class DuplicateSetWalker extends ReadWalker {
+    DuplicateSet currentDuplicateSet; // TODO: does it make sense to call clear() and recycle the same object?


I wouldn't. I don't think the performance gain is worth the potential confusion if someone does something like try to accumulate duplicate sets during the traversal.

This should be private, protected, or marked visiblefortesting.

Sounds good. Done

lbergelson · 2020-04-10T22:38:36Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+
+    @Override
+    public final void onStartup(){
+        // TODO: I had to make onStartUp of ReadWalker not final to do this. The right thing to do here is to


I'm not super happy about that, it makes it less clear if downstream tools are meant to override it.

lbergelson · 2020-04-10T22:40:11Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+        // write a ReadWalkerBase parent class for both ReadWalker and DuplicateSet Walker, as Louis suggested.
+        super.onStartup();
+        currentDuplicateSet = new DuplicateSet();
+        currentDuplicateSet.setMoleduleId(INITIAL_MOLECULAR_ID);


Maybe I'm being dense, but couldn't you initialize this in place when you define the variable and not need to do it here? That would avoid the need for overriding onStartup. If you can't instantiate a DuplicateSet and set the ID at the same time, I would just change duplicate set. It would probably be better though if instead of exposing a magic number, DuplicateSet had a function like isEmpty() that would answer the question of if a read has been added to it yet.

Good points. Done.

lbergelson · 2020-04-10T22:40:57Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+     * read6: ... MI:Z:1/B ...
+     * read7: ... MI:Z:1/B ...
+     *
+     * Thus it's sufficient to go through the reads in order and collect them in a list until


Is there any way to tell if the file is sorted in the correct way or not? It would be good to check that it is somehow.

You could do something silly like using a bloom filter to track already seen molecules ID's and warn if you see a surplus of collisions.

I would check if the bam have any sorting subkeys specified, it's a thing Nil's and Tim wanted to add to the spec so they might be outputting the. I don't think htsjdk knows about them yet but if it exists in the bams it would be a good reason to make htsjdk aware of it.

Bam header has SO:unsorted.

Added a check to ensure that the molecular ID is monotonically increasing and that the tag exists. They should catch it if the input bam is not sorted as required—is that good enough?

Yeah, that sounds great.

lbergelson · 2020-04-10T22:47:03Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+ */
+public abstract class DuplicateSetWalker extends ReadWalker {
+    DuplicateSet currentDuplicateSet; // TODO: does it make sense to call clear() and recycle the same object?
+    public static int INITIAL_MOLECULAR_ID = -1;


removed variable

lbergelson · 2020-04-13T21:21:29Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/UMI.java

+    }
+
+    // Strict match e.g. equalsExactly("AGT-GCT", "GCT-AGT") returns false
+    public boolean equalsExactly(final UMI that) {


This should be javadoc.

lbergelson · 2020-04-13T21:21:41Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/UMI.java

+    // Check for whether the two umis came from the same molecule
+    // e.g. equalsModuloOrder("AGT-GCT", "GCT-AGT") returns true
+    //      equalsModuloOrder("AGT-GCT", "AGT-GCT") also returns true
+    public boolean equalsModuloOrder(final UMI that) {


This should also be javadoc

Is this post error correction?

lbergelson · 2020-04-13T21:22:10Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/UMI.java

+        return that.umiSmall.equals(this.umiSmall) && that.umiLarge.equals(this.umiLarge);
+    }
+
+    public String getStandardizedUMI(){


lbergelson · 2020-04-13T21:22:22Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/UMI.java

+
+    public String getStandardizedUMI(){
+        if (umi1.compareTo(umi2) > 0){
+            // umi1 is lexicographically greater e.g. umi1 = TAT, umi2 = AAC


This is redundant since you sorted them already, just return umiSmall + umiLong, or better yet just save the combination.

lbergelson · 2020-04-13T21:23:03Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/UMI.java

+import org.apache.commons.lang3.tuple.Pair;
+import org.broadinstitute.hellbender.engine.filters.UMIReadFilter;
+import org.broadinstitute.hellbender.utils.read.GATKRead;
+


This class has lots of unused/untested stuff that probably should be used / tested. The umi logic should be implemented here and called from the filter.

lbergelson

@takutosato Some more comments. I have some thoughts about changing DuplicateSet to disallow non-matching values instead of reporting false when adding them. It's possible I don't understand a necessary use case though.

You need unit tests for the rest of the classes here that aren't tested. UMI, UMIReadFilter, and DuplicateSet itself.

lbergelson · 2020-04-14T20:01:35Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+ *
+ * The input bam must have been sorted by UMI using FGBio GroupReadsByUmi (http://fulcrumgenomics.github.io/fgbio/tools/latest/GroupReadsByUmi.html).
+ *
+ * Use this tool to create, for instance, an insilico mixture of duplex-sequenced samples to simulate tumor subclone.


typo? tumor subclones

lbergelson · 2020-04-14T20:03:20Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+    @Argument(fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME, shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME, doc = "")
+    public File outputBam;
+
+    @Argument(fullName = "DS", doc = "This fraction of duplicate sets in the input bam will be retained")


I would probably give this a more descriptive name. Maybe "fraction-to-keep"?

You should add bounds to this, i.e. minValue = 0.0 maxValue = 1.0

lbergelson · 2020-04-14T20:04:54Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+
+    private static final int RANDOM_SEED = 142;
+    private RandomGenerator rng;
+    private static int numFragments;


I don't think these fields should be static.

lbergelson · 2020-04-14T20:05:34Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+    public void onTraversalStart() {
+        super.onTraversalStart();
+        rng = RandomGeneratorFactory.createRandomGenerator(new Random(RANDOM_SEED));
+        outputWriter = createSAMWriter(IOUtils.getPath(outputBam.getAbsolutePath()), false);


Pre-sorted = false means this will resort the output I believe. Is that what you want?

lbergelson · 2020-04-14T20:06:33Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+
+    @Override
+    public void onTraversalStart() {
+        super.onTraversalStart();


This is unnecessary because the walker bases are obligated to NOT implement this. It's fine to do it anyway though.

lbergelson · 2020-04-14T20:50:10Z

...java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSetTest.java

+                .add("O", out.getAbsolutePath());
+        runCommandLine(args, DownsampleByDuplicateSet.class.getSimpleName());
+
+        final ReadsDataSource originalBam = new ReadsDataSource(Paths.get(NA12878_GROUPED));


declare datasources in a try-with-resources

lbergelson · 2020-04-14T20:51:47Z

...java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSetTest.java

+        final Iterator<GATKRead> iterator = readsDataSource.iterator();
+        while (iterator.hasNext()){
+            final GATKRead read = iterator.next();
+            final String molecularID = read.getAttributeAsString("MI");


use the constant you defined

lbergelson · 2020-04-14T20:53:28Z

...java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSetTest.java

+
+    /** When down-sampling rate is 1.0, the input file is returned unchanged **/
+    @Test
+    public void testNoDownsampling(){


Can you add a test that shows that DS = 0.0 produces an empty file?

lbergelson · 2020-04-14T20:57:07Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+ * This tool address the above problem by dropping a set fraction of _duplicate sets_, rather than reads, at random.
+ * Implicit in this approach is that a read and its mate are dropped or retained together.
+ * While trivial when the input bam is sorted by UMI and query name, this is far from trivial when one attempts
+ * to downsample reads naively with a tool like {@link PrintReads}.


Should the doc include an example usage?

Example usage added.

lbergelson · 2020-04-14T20:57:22Z

...java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSetTest.java

+                    .add("O", out.getAbsolutePath());
+            runCommandLine(args, DownsampleByDuplicateSet.class.getSimpleName());
+
+            final ReadsDataSource originalBam = new ReadsDataSource(Paths.get(NA12878_GROUPED));


try-with-resources

lbergelson · 2020-04-14T21:11:25Z

@takutosato Woops, I just realized you might have just wanted a rough look at this and not an in depth review... sorry if this was premature.

takutosato · 2020-04-15T15:02:39Z

@lbergelson no I appreciate the review. Having read the code, do you think it's worth writing a "ReadWalkerBase," which would be a parent class of ReadWalker and DuplicateSetWalker?

lbergelson · 2020-04-15T19:20:53Z

I don't think it's super necessary as long as the weirdness about the double apply method is documented. It wouldn't be bad, but it's a lot of extra work for not too much benefit. I think we should push it to the future. If we have more classes like this it might be worth it.

What might be worthwhile is changing the DuplicateSetWalker to be more abstract, so that it is a walker that groups reads by something instead of always molecular id. An obvious similar walker would be one that groups by readname in readname sorted bams, or by cell in single cell RNA bams. Since we don't have an obvious need for either of those though we could also wait until we have a second example of a walker that needs to do this.

tedsharpe · 2020-04-15T19:32:27Z

I'm working on a tool that processes aligned contigs from local assemblies that could use the group-by-queryname functionality. It would be nice not to have to buffer everything that comes through the apply method to process at the end of the traversal. Not hugely important, but there's your 2nd example.

lbergelson · 2020-04-15T20:22:42Z

@tedsharpe @takutosato Lets do it then, but lets do it in a second PR since this one has a ton of comments already?

lbergelson · 2020-04-15T20:23:37Z

@tedsharpe Your tool is a little more sophisticated in it's grouping though isn't it? It's going out and searching for long range mates I thought? This is just batching things as they come in assuming everything is in order.

takutosato · 2020-04-15T20:58:23Z

@lbergelson making DuplicateSetWalker more abstract in a separate PR sounds good!

tedsharpe · 2020-04-15T21:20:44Z

@lbergelson Not in this part. Just grouping a queryname sorted input by read name. (Need to gather the primary line together with all supplementary alignments.) I think we'd find other uses, too.
@takutosato Separate PR is fine with me.

takutosato · 2020-04-21T22:55:53Z

@lbergelson I addressed your comments, cleaned up the code, and wrote tests. I removed UMI.java and UMIReadFilter.java. They will likely come back in a future PR.

gatk-bot · 2020-04-21T23:30:20Z

Travis reported job failures from build 30111
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
unit	openjdk11	30111.14	logs
unit	openjdk8	30111.3	logs

droazen · 2020-04-22T17:41:20Z

@takutosato Did you want to try to get this into this week's release?

takutosato · 2020-04-22T19:02:22Z

@droazen that was my goal but next release is fine too

fleharty · 2020-04-22T19:15:44Z

@takutosato Is there anything holding this up so that it can't be ready for the upcoming release? If it is possible, I would like to see this in the next release.

takutosato · 2020-04-22T19:42:40Z

@fleharty nope just pending review.

fleharty · 2020-04-22T19:58:10Z

@lbergelson Is there time to re-review this PR before the release, or have we run out of time?

droazen · 2020-04-22T20:40:42Z

If @lbergelson is able to do a final review on this branch tomorrow, I'd be happy to wait until it's in before releasing.

takutosato · 2020-04-22T20:55:48Z

@droazen incidentally, are you OK with me adding finalizeTraversal() do ReadWalker? From inside apply() we don't know whether the read we're looking at is the last one, so we need to process whatever hasn't been processed before ending traversal.

droazen · 2020-04-22T21:03:51Z

@takutosato Why can't you override the existing method onTraversalSuccess()? It gets called right after the last apply() call.

takutosato · 2020-04-22T21:18:43Z

Because the reference variable is package private. If I want to call apply() from onTraversalSuccess, it needs to be protected—will push a commit to shortly to show you what I mean.

droazen · 2020-04-22T21:26:52Z

@takutosato Just call the protected method directlyAccessEngineReferenceDataSource() from onTraversalSuccess() :)

takutosato · 2020-04-23T14:32:29Z

@droazen it looked like it was going to work but then—

@Override
    final protected ReferenceDataSource directlyAccessEngineReferenceDataSource() {
        throw new GATKException("Should never directly access the engine ReferenceDataSource in walker tool classes " +
                "outside of the engine package. Walker tools should get their data via apply() instead.");
    }

Also, this is really not the responsibility of the tool class—it should be handled as part of the walker.

droazen · 2020-04-23T14:45:08Z

@takutosato Ah yes, I forgot about the final override of that method in WalkerBase. In that case, my suggestion would be to add the additional method in DuplicateSetWalker rather than ReadWalker, since the need for it is caused by the accumulation of reads specific to DuplicateSetWalker.

tedsharpe · 2020-04-23T14:46:53Z

Stash the reference context you get when passing through the original apply method, and use it in onTraversalSuccess when calling the new apply method to push out the final set.

droazen · 2020-04-23T14:59:52Z

@tedsharpe The problem with that is that apply() gets a ReferenceContext, not the ReferenceDataSource needed here. I think that an additional finalization method in DuplicateSetWalker is probably the cleanest solution for now.

takutosato · 2020-04-23T15:17:29Z

@tedsharpe @droazen thanks guys!

Just pushed a commit with suggested changes. I defined onTraversalSuccess in duplicateSetWalker to process the remaining set of reads. Then the tool subclass calls super.onTraversalSuccess() first thing inside its onTraversalSuccess.

Is that a good enough workaround for now?

droazen · 2020-04-23T16:52:29Z

@takutosato A separate method called automatically in DuplicateSetWalker would be more foolproof than requiring the tool author to remember to call super in onTraversalSuccess(), I think.

lbergelson

@takutosato Looks great! Much improved since the rough draft! Thank you.

I have a few comments about minor stuff but I think it's good to merge now and those can be addressed later.

lbergelson · 2020-04-23T16:39:22Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+    @Argument(fullName = MIN_REQUIRED_READS_NAME, doc = "The mininum total number of reads required in the set", optional = true)
+    private int minimumRequiredReadsPerUMI = DEFAULT_MINIMUM_READS_PER_SET;
+
+    // The user may choose to keep read sets containing both strands (i.d. duplex evidence) only by setting this argument to a positive number


Wording. This would be clearer as "The user may choose to only keep read sets containing both strands by setting this argument to a positive number."

I misread this at first as the "read sets containing both strands will be kept ONLY if this is set to a positive number"

Ah I went back and forth about this and I clearly chose wrong. done.

lbergelson · 2020-04-23T16:40:00Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+    private int minimumRequiredReadsPerUMI = DEFAULT_MINIMUM_READS_PER_SET;
+
+    // The user may choose to keep read sets containing both strands (i.d. duplex evidence) only by setting this argument to a positive number
+    @Argument(fullName = MIN_REQUIRED_READS_PER_STRAND_NAME, doc = "The mininum total number of reads in each strand", optional = true)


You can add a minValue = 0 here to keep it positive.

lbergelson · 2020-04-23T16:40:28Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+    private static final int DEFAULT_MINIMUM_READS_PER_STRAND = 0;
+
+    @Argument(fullName = MIN_REQUIRED_READS_NAME, doc = "The mininum total number of reads required in the set", optional = true)
+    private int minimumRequiredReadsPerUMI = DEFAULT_MINIMUM_READS_PER_SET;


I would add a minValue to the argument to make sure it's positive.

lbergelson · 2020-04-23T16:42:17Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+
+        if (duplicateSetMoleculeNumber < readMoleculeNumber) {
+            // The incoming read's molecule ID does not match that of the current duplicate, meaning we've reached the end of the current set.
+            // Call the apply() method to process the current set and start a new set.


this comment should go on the line below, it's not about the rejection stuff

lbergelson · 2020-04-23T16:43:31Z

src/main/java/org/broadinstitute/hellbender/engine/DuplicateSetWalker.java

+            return;
+        }
+
+        // The incoming read has the same UMI as the current set; simply add to the current set


I like to put the exclusive conditions in an else block to make it clear that none of the options above fall down into this, but that's a style point that some people disagree with.

Yea I didn't like the way the if else blocks looked so I did it this way, hope that's ok.

lbergelson · 2020-04-23T16:47:36Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/consensus/DownsampleByDuplicateSet.java

+
+    @Override
+    public void onTraversalStart() {
+        rng = new Random(RANDOM_SEED);


I would just initialize this where you declare rng and make the variable final.

lbergelson · 2020-04-23T16:49:04Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/consensus/MoleculeID.java

+        return moleculeNumber + ReadsWithSameUMI.FGBIO_MI_TAG_DELIMITER + strand;
+    }
+
+    public static int getMoleculeNumberOfRead(final GATKRead read){


These should probably have javadoc too even though they're pretty self explanatory.

lbergelson · 2020-04-23T16:51:53Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/consensus/MoleculeID.java

+     * @returns Counts of reads from each strand, the first element is always larger than the second
+     **/
+    public static Pair<Integer, Integer> countStrands(final List<GATKRead> reads){
+        final int strandACount = (int) reads.stream().filter(r -> getStrandOfRead(r).equals("A")).count();


There's definitely a more efficient way of doing this that only loops through once but this is fine.

lbergelson · 2020-04-23T16:55:59Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/consensus/ReadsWithSameUMI.java

+    public void addRead(final GATKRead read){
+        Utils.validate(reads.isEmpty() || moleculeID.getMoleculeNumber() == MoleculeID.getMoleculeNumberOfRead(read),
+                String.format("Molecule number of the set and that of the new read don't match: set number = %d, read number = %d", moleculeID.getMoleculeNumber(), MoleculeID.getMoleculeNumberOfRead(read)));
+        Utils.validate(interval.getContig().equals(read.getContig()),


There's a interval.contigsMatch(read) method that makes this slightly less verbose (and could be implemented slightly more efficiently than getContig.equals(getContig) in some cases.

lbergelson · 2020-04-23T16:57:14Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/consensus/ReadsWithSameUMI.java

+    private void init(GATKRead read){
+        Utils.validate(reads.isEmpty(), String.format("Initializing a non-empty set"));
+        moleculeID = new MoleculeID(read);
+        interval = new SimpleInterval(read.getContig(), read.getStart(), read.getEnd());


I think you can just write this as new SimpleInterval(read)

takutosato · 2020-04-23T17:01:28Z

I think inside onTraversalSuccess might be the only place in which we can call such a method automatically—right?

droazen · 2020-04-23T18:09:31Z

@takutosato Why not in the traverse() method of DuplicateSetWalker?

droazen · 2020-04-23T18:11:20Z

(ie., override traverse() in DuplicateSetWalker, call super.traverse() (the ReadWalker version) first, then do your extra cleanup work)

takutosato · 2020-04-23T19:04:41Z

@droazen that is GENIUS

droazen

👍

… an input bam sorted by UMI. (#6512)

droazen requested a review from lbergelson April 1, 2020 17:43

droazen assigned lbergelson Apr 1, 2020

cmnbroad reviewed Apr 13, 2020

View reviewed changes

lbergelson requested changes Apr 13, 2020

View reviewed changes

lbergelson requested changes Apr 14, 2020

View reviewed changes

lbergelson unassigned lbergelson Apr 14, 2020

lbergelson approved these changes Apr 23, 2020

View reviewed changes

A new tool to randomly sample a fraction of an input bam sorted by UMI.

44557b9

takutosato force-pushed the ts_downsample branch from 5ce1412 to 44557b9 Compare April 23, 2020 20:55

droazen approved these changes Apr 23, 2020

View reviewed changes

droazen merged commit 4ec2a4b into master Apr 23, 2020

droazen deleted the ts_downsample branch April 23, 2020 22:12

jonn-smith pushed a commit that referenced this pull request Jul 14, 2020

DownsampleByDuplicateSet: a new tool to randomly sample a fraction of…

c6ec893

… an input bam sorted by UMI. (#6512)

Downsample a bam by duplicate sets instead of reads #6512

Downsample a bam by duplicate sets instead of reads #6512

Conversation

takutosato commented Mar 19, 2020

gatk-bot commented Mar 23, 2020

droazen commented Apr 1, 2020

takutosato commented Apr 10, 2020

lbergelson commented Apr 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmnbroad Apr 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbergelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbergelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbergelson commented Apr 14, 2020

takutosato commented Apr 15, 2020

lbergelson commented Apr 15, 2020

tedsharpe commented Apr 15, 2020

lbergelson commented Apr 15, 2020

lbergelson commented Apr 15, 2020

takutosato commented Apr 15, 2020

tedsharpe commented Apr 15, 2020

takutosato commented Apr 21, 2020

gatk-bot commented Apr 21, 2020 • edited Loading

droazen commented Apr 22, 2020

takutosato commented Apr 22, 2020

fleharty commented Apr 22, 2020

takutosato commented Apr 22, 2020

fleharty commented Apr 22, 2020

droazen commented Apr 22, 2020

takutosato commented Apr 22, 2020

droazen commented Apr 22, 2020 • edited Loading

takutosato commented Apr 22, 2020

cmnbroad Apr 13, 2020 •

edited

Loading

gatk-bot commented Apr 21, 2020 •

edited

Loading

droazen commented Apr 22, 2020 •

edited

Loading

droazen commented Apr 22, 2020 •

edited

Loading