Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero-based indexing is inconsistently applied in Fasta #74

Closed
ihaque opened this issue Oct 8, 2015 · 2 comments
Closed

zero-based indexing is inconsistently applied in Fasta #74

ihaque opened this issue Oct 8, 2015 · 2 comments
Assignees
Labels
Milestone

Comments

@ihaque
Copy link

ihaque commented Oct 8, 2015

Setting one_based_attributes=False on a Fasta object does not propagate indexing operations properly down to the underlying Faidx or Sequence objects.

Given the following test data file:

>test_chrom
ACGTACGTACGTACGTACGTACGT

and using pyfaidx @ 6e54c9b with the following preamble:

>>> import pyfaidx
>>> f0 = pyfaidx.Fasta('tests/data/test.fa', one_based_attributes=False)
>>> f1 = pyfaidx.Fasta('tests/data/test.fa', one_based_attributes=True)

I get the following issues:

  1. Sequences have inconsistent names depending on whether zero-based and one-based coordinates are used:
>>> f0['test_chrom'][:]     
>test_chrom
ACGTACGTACGTACGTACGTACGT
>>> f1['test_chrom'][:]
>test_chrom:1-24
ACGTACGTACGTACGTACGTACGT
  1. Using the same coordinates between "zero-based" and "one-based" objects produces the exact same results, rather than the single-base shift expected:
>>> f1['test_chrom'][:][1:5]
>test_chrom:2-5
CGTA
>>> f0['test_chrom'][:][1:5]
>test_chrom
CGTA
@mdshw5
Copy link
Owner

mdshw5 commented Oct 8, 2015

Thanks so much for filing a reproducible issue! I have to explain that you have discovered one bug and encountered one confusing feature.

First, the Sequence.longname method should definitely be updated to behave as you expect.

The second issue you raise is actually expected behavior. The one_based_attributes argument only applies to the "friendly" coordinate representation but should have no effect on the slice notation. I should update the documentation to make this explicit. I believe that having a switch for the slicing notation would be quite confusing and allow subtle bugs to go unnoticed.

@mdshw5 mdshw5 self-assigned this Oct 8, 2015
@mdshw5 mdshw5 added the bug label Oct 8, 2015
@mdshw5
Copy link
Owner

mdshw5 commented Oct 8, 2015

  • Fix Sequence.longname behavior with one_based_attributes
  • Add test case

@mdshw5 mdshw5 modified the milestone: v0.4.3 Oct 22, 2015
@mdshw5 mdshw5 closed this as completed in cc51262 Oct 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants