Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point values are allowed as file offsets #41

Closed
travc opened this issue Jan 20, 2015 · 8 comments
Closed

Floating point values are allowed as file offsets #41

travc opened this issue Jan 20, 2015 · 8 comments
Assignees
Labels

Comments

@travc
Copy link

travc commented Jan 20, 2015

Using python 3.4.0 and whatever version of pyfaidx is currently grabbed by pip (you should add __version__ to the module)...

Simple access (and slicing) both end up trying to use floats where ints are required.

Traceback (most recent call last):
  File "./Ngaps.py", line 207, in <module>
    sys.exit(Main(argv=None))
  File "./Ngaps.py", line 100, in Main
    x = ref[0][100]
  File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 460, in __getitem__
    return self._fa.get_seq(self.name, n + 1, n + 1)
  File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 561, in get_seq
    return self.faidx.fetch(name, start, end)
  File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 336, in fetch
    self.fill_buffer(name, start, end + self.read_ahead)
  File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 326, in fill_buffer
    seq = self.from_file(name, start, end)
  File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 380, in from_file
    seq = self.file.read(seq_blen).decode()
TypeError: 'float' object cannot be interpreted as an integer

A quick fix (perhaps a hack) is to cast start and end to ints in from_file:

        start0 = int(start) - 1  # make coordinates [0,1)

and

        seq_len = int(end) - start0

Sorry, no time at the moment to thoroughly test, much less fork your most current code and proved you with a proper diff.

PS: The title of this issue isn't quite correct... It is a call to read where the error gets tripped, not an index.

PPS: Useful module! Thanks. Not super complicated, but thanks for doing it right and saving the rest of us from having to reinvent the wheel (or use SeqIO where it really isn't appropriate IMO).

@mdshw5
Copy link
Owner

mdshw5 commented Jan 20, 2015

No problem. I'll take a look today and figure this out. Are you trying to slice using floats or is this behavior produced when using integer indices?

On Jan 19, 2015, at 11:21 PM, Travis Collier [email protected] wrote:

Using python 3.4.0 and whatever version of pyfaidx is currently grabbed by pip (you should add version to the module)...

Simple access (and slicing) both end up trying to use floats for indexes.

Traceback (most recent call last):
File "./Ngaps.py", line 207, in
sys.exit(Main(argv=None))
File "./Ngaps.py", line 100, in Main
x = ref[0][100]
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/init.py", line 460, in getitem
return self._fa.get_seq(self.name, n + 1, n + 1)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/init.py", line 561, in get_seq
return self.faidx.fetch(name, start, end)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/init.py", line 336, in fetch
self.fill_buffer(name, start, end + self.read_ahead)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/init.py", line 326, in fill_buffer
seq = self.from_file(name, start, end)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/init.py", line 380, in from_file
seq = self.file.read(seq_blen).decode()
TypeError: 'float' object cannot be interpreted as an integer
A quick fix (perhaps a hack) is to cast start and end to ints in from_file:

    start0 = int(start) - 1  # make coordinates [0,1)

and

    seq_len = int(end) - start0

Sorry, no time at the moment to thoroughly test, much less fork your most current code and proved you with a proper diff.


Reply to this email directly or view it on GitHub.

@mdshw5
Copy link
Owner

mdshw5 commented Jan 20, 2015

Never mind. Saw your updates after I had my coffee. I'll reproduce the issue and fix it. Thanks for the issue report!

@mdshw5
Copy link
Owner

mdshw5 commented Jan 20, 2015

It would be helpful if you can provide the link to the Fasta file that you are using.

@mdshw5 mdshw5 changed the title trying to index by floats (python3) Floating point values are allowed as file offsets Jan 20, 2015
@mdshw5 mdshw5 added the bug label Jan 20, 2015
@mdshw5 mdshw5 self-assigned this Jan 20, 2015
@mdshw5
Copy link
Owner

mdshw5 commented Jan 20, 2015

Are you perhaps setting the Fasta read_ahead attribute? You might check that you're not passing a floating point value there, as I think you may be.

@travc
Copy link
Author

travc commented Jan 20, 2015

I am using `read_ahead', and may well be setting it to a float.
However, I think originally go the bug with very simple code:

ref = Fasta('reference.fa')
x = ref[0][100]

The float vs int thing is somewhat annoying. I tend to use exponential notation literals like 1e6 for parameters like window sizes, ranges, and such... which are floats in python.

@mdshw5
Copy link
Owner

mdshw5 commented Jan 20, 2015

Yes, I believe that the issue is the read_ahead buffer value since I can't reproduce this issue without setting read_ahead to a float (and your traceback definitely has this set to some value). If this doesn't solve your case, then I'd be glad to look at it given that you can send me the fasta file you're using!

@travc
Copy link
Author

travc commented Jan 20, 2015

Ok... I can see how that could cause problems.
It might be a good idea to check (or just cast) the the values which must be ints as they are passed in.

BTW: The fasta file I'm using in this case the AgamP4 reference:
https://www.vectorbase.org/download/anopheles-gambiae-pestchromosomesagamp4fagz

PS: The task I'm doing is just quickly counting Ns and soft-masked (repeats) in windows across the chromosomes. Using pyfaidx, it takes just a few lines of code and is Impressively fast.

@mdshw5
Copy link
Owner

mdshw5 commented Jan 20, 2015

Yes, it's a great idea and I've implemented it in d7ed52e. I think raising an error is the least surprising thing to do, and appreciate your help finding this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants