Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CountBasesSpark doesn't work with -L opt #6319

Closed
bhanugandham opened this issue Dec 16, 2019 · 7 comments · Fixed by #6767
Closed

CountBasesSpark doesn't work with -L opt #6319

bhanugandham opened this issue Dec 16, 2019 · 7 comments · Fixed by #6767

Comments

@bhanugandham
Copy link
Contributor

User report:

I test this in 4.1.4.0 and 4.1.4.1

gatk CountBasesSpark \
     -I input_reads.bam \
     -O base_count.txt

When run this cmd, it is OK, and get a right output base_count.txt.
But I want compute bases located in a interval file, so:

gatk CountBasesSpark \
     -I input_reads.bam \
     -O base_count.txt\
     -L interval.file

This cmd cannot run successfully, with some errors I find like this:

......
9/11/28 17:44:01 INFO NewHadoopRDD: Input split: file:/disks/disk1/data_sample/19NGS14
2/19NGS142.bam:1476395008+33554432
19/11/28 17:44:01 INFO NewHadoopRDD: Input split: file:/disks/disk1/data_sample/19NGS14
2/19NGS142.bam:1509949440+33554432
19/11/28 17:44:01 INFO NewHadoopRDD: Input split: file:/disks/disk1/data_sample/19NGS14
2/19NGS142.bam:704643072+33554432
19/11/28 17:44:02 ERROR Executor: Exception in task 6.0 in stage 1.0 (TID 7)
java.util.NoSuchElementException: next on empty iterator
        at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
        at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
        at scala.collection.Iterator$$anon$13.next(Iterator.scala:469)
......

The interval.file is fine because I use it for the whole GATK pipeline.
The CountReadsSpark has the same error.

Please check this

Thanks.
Chris

This Issue was generated from your [forums]
[forums]: https://gatkforums.broadinstitute.org/gatk/discussion/24645/countbasesspark-doesnt-work-with-l-opt/p1

@bhanugandham
Copy link
Contributor Author

@droazen Created a issue ticket based on our discussion during the GATK office hrs.

@bhanugandham bhanugandham added this to the GATK-Priority-Backlog milestone Dec 16, 2019
@droazen droazen removed their assignment Dec 16, 2019
@droazen droazen removed this from the GATK-Priority-Backlog milestone Jun 22, 2020
@spromanos
Copy link

Hi GATK team,

I tried running your PathSeq pipeline (broadinstitute/gatk:4.1.8.0) on my cohort and almost half of the samples failed the scoring step with this error message:

20/07/17 09:38:35 INFO NewHadoopRDD: Input split: file:/cromwell_root/fc-6e61d4b2-bdc8-4abd-bb94-18d8fa11d9b6/7c1b0faa-e956-4289-9e2d-4fb8b9eff6ff/PathSeqPipeline/0ca5578f-70d3-498e-b7cc-23590f0ab31f/call-PathSeqAlign/MMRF_2072_2_BM.microbe_aligned.paired.bam:33554432+33554432 20/07/17 09:38:46 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 5) java.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) at scala.collection.Iterator$$anon$13.next(Iterator.scala:469) at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31) at org.broadinstitute.hellbender.relocated.com.google.common.collect.Iterators$PeekingImpl.next(Iterators.java:1155) at org.broadinstitute.hellbender.utils.spark.SparkUtils.lambda$putReadsWithTheSameNameInTheSamePartition$7bd206b0$1(SparkUtils.java:190) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Looking at the aligned bams that go into the scoring task, they don't appear to be empty or different to the rest of the cohort. Any thoughts?

@droazen
Copy link
Contributor

droazen commented Jul 17, 2020

@spromanos Could you please open a new ticket for this issue instead of replying to this existing ticket?

@spromanos
Copy link

Done! Thanks

@droazen
Copy link
Contributor

droazen commented Jul 27, 2020

There is a potential fix for the "next on empty iterator" error in PR #6652 -- this should be part of the next GATK release, and may enable us to close this ticket.

@droazen
Copy link
Contributor

droazen commented Aug 12, 2020

This is believed fixed in the now-merged PR #6652

@droazen
Copy link
Contributor

droazen commented Aug 12, 2020

@lbergelson Can you confirm that this issue can now be closed as resolved?

lbergelson added a commit that referenced this issue Aug 25, 2020
* It turns out Rdd.reduce crashes when it encounters emptyy data, use fold instead.
* Fix #6319
droazen pushed a commit that referenced this issue Aug 26, 2020
It turns out Rdd.reduce crashes when it encounters emptyy data, use fold instead.

Fixes #6319
mwalker174 pushed a commit that referenced this issue Nov 3, 2020
It turns out Rdd.reduce crashes when it encounters emptyy data, use fold instead.

Fixes #6319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants