You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This user was able to access the GenomicsDB workspace but is having performance issues with SelectVariants. They tried the same command locally and it took less than a minute. Are there any changes with how the user is running SelectVariants to improve the performance?
GATK Info
GATK 4.1.9.0
This request was created from a contribution made by Lucas Taniguti on February 01, 2021 22:41 UTC.
Thank you, it has started to work with gendb.gs://
But now I think it does not run. I have only one sample stored into the database and I'm selecting only chr20:1-1000000 and it is running for more than 30 minutes. Is it expected?
I'm using a VM from GCE, in the same region as the GCS bucket.
Using GATK jar /home/taniguti/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use\_async\_io\_read\_samtools=false -Dsamjdk.use\_async\_io\_write\_samtools=true -Dsamjdk.use\_async\_io\_write\_tribble=false -Dsamjdk.compression\_level=2 -Xmx10g -Xms5g -
jar /home/taniguti/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar SelectVariants -R Homo\_sapiens\_assembly38.fasta -V gendb.gs://mybucket/genomicsdb -L chr20:1-1000000 -O teste.
vcf.gz
23:01:23.595 INFO NativeLibraryLoader - Loading libgkl\_compression.so from jar:file:/home/taniguti/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl\_compres
sion.so
23:01:23.914 INFO SelectVariants - ------------------------------------------------------------
23:01:23.915 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0
23:01:23.915 INFO SelectVariants - For support and documentation go to [https://software.broadinstitute.org/gatk/](https://software.broadinstitute.org/gatk/)
23:01:23.918 INFO SelectVariants - Executing as taniguti@phasing-shapeit4-taniguti on Linux v5.4.0-1036-gcp amd64
23:01:23.918 INFO SelectVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.9.1+1-Ubuntu-0ubuntu1.20.04
23:01:23.919 INFO SelectVariants - Start Date/Time: February 1, 2021 at 11:01:23 PM UTC
23:01:23.919 INFO SelectVariants - ------------------------------------------------------------
23:01:23.919 INFO SelectVariants - ------------------------------------------------------------
23:01:23.928 INFO SelectVariants - HTSJDK Version: 2.23.0
23:01:23.929 INFO SelectVariants - Picard Version: 2.23.3
23:01:23.929 INFO SelectVariants - HTSJDK Defaults.COMPRESSION\_LEVEL : 2
23:01:23.929 INFO SelectVariants - HTSJDK Defaults.USE\_ASYNC\_IO\_READ\_FOR\_SAMTOOLS : false
23:01:23.929 INFO SelectVariants - HTSJDK Defaults.USE\_ASYNC\_IO\_WRITE\_FOR\_SAMTOOLS : true
23:01:23.929 INFO SelectVariants - HTSJDK Defaults.USE\_ASYNC\_IO\_WRITE\_FOR\_TRIBBLE : false
23:01:23.930 INFO SelectVariants - Deflater: IntelDeflater
23:01:23.930 INFO SelectVariants - Inflater: IntelInflater
23:01:23.930 INFO SelectVariants - GCS max retries/reopens: 20
23:01:23.930 INFO SelectVariants - Requester pays: disabled
23:01:23.930 INFO SelectVariants - Initializing engine
23:01:25.939 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See [http://logging.apache.org/log4j/1.2/faq.html#noconfig](http://logging.apache.org/log4j/1.2/faq.html#noconfig) for more info.
23:01:39.847 info NativeGenomicsDB - pid=4376 tid=4377 No valid combination operation found for INFO field AS\_InbreedingCoeff - the field will NOT be part of INFO fields in the g
enerated VCF records
23:01:39.847 info NativeGenomicsDB - pid=4376 tid=4377 No valid combination operation found for INFO field AS\_QD - the field will NOT be part of INFO fields in the generated VCF
records
23:01:39.848 info NativeGenomicsDB - pid=4376 tid=4377 No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF rec
ords
23:01:39.848 info NativeGenomicsDB - pid=4376 tid=4377 No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the gene
rated VCF records
23:01:39.848 info NativeGenomicsDB - pid=4376 tid=4377 No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF
records
23:01:39.848 info NativeGenomicsDB - pid=4376 tid=4377 No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF
records
23:01:51.886 INFO IntervalArgumentCollection - Processing 1000000 bp from intervals
23:01:51.918 INFO SelectVariants - Done initializing engine
23:01:52.050 INFO ProgressMeter - Starting traversal
23:01:52.051 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute<br><br><i>(created from <a href='https://broadinstitute.zendesk.com/agent/tickets/105490'>Zendesk ticket #105490</a>)<br>gz#105490</i>
The text was updated successfully, but these errors were encountered:
The user has posted an update with the jstack logs and they can be downloaded here (https://gatk.broadinstitute.org/hc/en-us/community/posts/360076845511/comments/360014258071)
They also provided info that when they run GenomicsDBImport for the two samples in the same command, GenotypeGVCFs completes in 14 minutes. But if they import one sample at a time (using --genomicsdb-update-workspace-path) the GenotypeGVCFs process appears hung. @nalinigans@mlathara Any thoughts?
Summary
This user was able to access the GenomicsDB workspace but is having performance issues with SelectVariants. They tried the same command locally and it took less than a minute. Are there any changes with how the user is running SelectVariants to improve the performance?
GATK Info
GATK 4.1.9.0
This request was created from a contribution made by Lucas Taniguti on February 01, 2021 22:41 UTC.
Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360076845511-How-do-I-SelectVariants-from-GenomicsDB-stored-in-GCS-#community_comment_360014183291
--
Thank you, it has started to work with gendb.gs://
But now I think it does not run. I have only one sample stored into the database and I'm selecting only chr20:1-1000000 and it is running for more than 30 minutes. Is it expected?
I'm using a VM from GCE, in the same region as the GCS bucket.
Using GATK jar /home/taniguti/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
The text was updated successfully, but these errors were encountered: