Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samtools view fails on CRAM and compressed reference if .gzi file is missing #1744

Closed
TheodoreSpeaks opened this issue Feb 1, 2024 · 1 comment · Fixed by #1745
Closed
Assignees

Comments

@TheodoreSpeaks
Copy link

Are you using the latest version of samtools and HTSlib? If not, please specify.

(run samtools --version)

> samtools --version
samtools 1.19
Using htslib 1.19
Copyright (C) 2023 Genome Research Ltd.

Please describe your environment.

  • OS (run uname -sr on Linux/Mac OS or wmic os get Caption, Version on Windows)
    Darwin 23.2.0
  • machine architecture (run uname -m on Linux/Mac OS or wmic os get OSArchitecture on Windows)
    arm64
  • compiler (run gcc --version or clang --version)
> gcc --version

Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: arm64-apple-darwin23.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Please specify the steps taken to generate the issue, the command you are running and the relevant output.

  1. In a new directory, add reference toy_ref.fasta and bam file toy_bam.bam
  2. Compress the reference file with bgzip. This creates toy_ref.fasta.gz
> bgzip toy_ref.fasta
> ls
toy_bam.bam      toy_ref.fasta.gz
  1. Generate a cram file from the bam file using samtools view. This generates .cram, .fai, and .gzi files (since reference is compressed)
samtools view -C -h -T toy_ref.fasta.gz -o toy_cram.cram toy_bam.bam
  1. Remove the .gzi file while keeping the .fai file.
> rm *.gzi
> ls
toy_bam.bam          toy_cram.cram        toy_ref.fasta.gz     toy_ref.fasta.gz.fai
  1. Run samtools view on the cram, should fail with file not found:
> samtools view --no-PG --reference toy_ref.fasta.gz toy_cram.cram --verbosity 10
[I::hts_idx_check_local] Using alignment file 'toy_ref.fasta.gz'
[E::bgzf_index_load] Error opening toy_ref.fasta.gz.gzi : No such file or directory
[E::bgzf_open_ref] Unable to load .gzi index 'toy_ref.fasta.gz.gzi'
[E::refs_load_fai] Failed to open reference file 'toy_ref.fasta.gz'
[E::hts_open_format] Failed to open file "toy_cram.cram" : No such file or directory
samtools view: failed to open "toy_cram.cram" for reading: No such file or directory

This issue only occurs when only .fai is present without .gzi. Removing the .fai file as well allows samtools view to execute successfully:

rm *.fai

### Runs successfully
samtools view --no-PG --reference toy_ref.fasta.gz toy_cram.cram -o /dev/null

This is similar to an issue found in the past with samtools faidx not running correctly if the .fai file exists but the .gzi file does not: samtools/samtools#804

I think the issue is caused by samtools only checking for .fai existence to determine whether to regenerate index files, rather than validating that both .fai and .gzi exists for bgzipped compressed references.

Code reference:

@jkbonfield jkbonfield transferred this issue from samtools/samtools Feb 9, 2024
@jkbonfield
Copy link
Contributor

Migrated to htslib as the appropriate fix is there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants