-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create or load htslib .fai and .gzi index files when using BGZF files #126
Comments
I think this is a good idea, and the work to support this is:
|
I checked the .fai files created from pyfaidx and samtools, and they are the same. Also, samtools must have .gzi to work. Hope this information helps. |
Closing this issue assuming that #1701 closes this issue. Thanks :) |
Are you able to test whether the issue is fixed? I’ll look into it as well, but I believe our BGZF indices may still be incompatible with samtools. |
Specifically the recursion issue in biopython is fixed, but I’d like to implement .gzi creation and a more efficient sequence retrieval in pyfaidx for BGZF files. Currently pyfaidx must fetch from the beginning of a record to the user specified end coordinate and returns the subset sequence from memory. This isn’t as efficient as samtools, and the limitation is in understanding how samtools generates virtual offsets from the .gzi to get the offset into the start coordinate. |
I see. When this is in place, please let us know. Thanks @mdshw5 |
Re-opening to work on this issue before the end of the year. |
Any progress on this? |
@IPetrik I did do some work on this earlier this year, but never made something that works. I believe I pushed what work I had here: db7f140. I'll take a look on my local machine and see if there's anything else. I'd really like to get this feature working properly so if you've got ideas please share. |
@IPetrik Forget me previous comment. I have some work on my local machine that's completely different. I'll update the |
There is a lot here that doesn't work, but mainly I was trying to figure out the format of the GZI file and provide methods to unpack and pack the binary on-disk format. There are also methods for loading the GZI into an object for use by Faidx.
I've opened a PR with the work for this issue in #164. If I have some time this summer I'll come back and keep working - it doesn't seem like there's much left to do except finish testing the GZI packing/unpacking and implementing methods to create and read the on-disk format. |
Hi,
When I do
samtools faidx file.fa.gz
and then try to use the samefile.fa.gz
file forpyfaidx
, I get an error saying thatfile.fa.gz
is not a valid BGZF file. But when I delete thefile.fa.gz.fai
and then usepyfaidx
, the error disappears. I believe this is because the.fai
pyfaidx
creates is different from.fai
samtools
creates.If this behavior is real, Is it possible to unify the
.fai
ofpyfaidx
andsamtools
? Thoughts?Kind regards,
Kwat
The text was updated successfully, but these errors were encountered: