Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read actual defline from FASTA file index "gap" #54

Closed
mdshw5 opened this issue Mar 3, 2015 · 0 comments
Closed

Read actual defline from FASTA file index "gap" #54

mdshw5 opened this issue Mar 3, 2015 · 0 comments
Assignees
Milestone

Comments

@mdshw5
Copy link
Owner

mdshw5 commented Mar 3, 2015

One current design limitation of pyfaidx is that it mirrors the samtools indexing behavior of truncating headers after whitespace. There is a good reason for this - any whitespace in the identifier would break the index file. A side effect of this is that frequently the "description" in a header will be lost when reading into the file using the index. It seems like an option to recover the full header line would be useful, and pretty cheap to implement.

To determine the byte offset and length of the header from the index file, we can determine the byte end of the preceding sequence by adding unprintable characters, and this should be the byte start of the real header line. We can then read from header byte start to sequence offset and save this as something like Sequence.long_name.

@mdshw5 mdshw5 self-assigned this Mar 3, 2015
@mdshw5 mdshw5 added this to the v0.3.7 milestone Mar 3, 2015
@mdshw5 mdshw5 closed this as completed in 2b0cef8 Mar 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant