Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain numpy array for sequence #139

Closed
alimanfoo opened this issue May 11, 2018 · 8 comments
Closed

Obtain numpy array for sequence #139

alimanfoo opened this issue May 11, 2018 · 8 comments

Comments

@alimanfoo
Copy link

Apologies if I've missed this in the documentation. I've been using pyfasta for a long time and often make use of the ability to load a sequence into a numpy array, e.g.:

In [2]: import pyfasta

In [3]: fasta = pyfasta.Fasta('/kwiat/vector/ag1000g/release/phase2.AR1/genome/agamP4/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa')

In [4]: list(fasta)
Out[4]: ['UNKN', '2L', 'X', '2R', '3R', 'Y_unplaced', 'Mt', '3L']

In [5]: import numpy as np

In [6]: seq = np.asarray(fasta['2R'])

In [7]: seq
Out[7]: 
array([b'C', b'T', b'c', ..., b'A', b'C', b'A'],
      dtype='|S1')

Is there an equivalent capability in pyfaidx?

@alimanfoo
Copy link
Author

cc @hardingnj

@mdshw5
Copy link
Owner

mdshw5 commented May 11, 2018

I think I can implement this functionality using the __array_interface__ property similar to how pyfasta FastaRecord objects work:

https://github.com/brentp/pyfasta/blob/c2f0611c5311f1b1466f2d56560447898b4a8b03/pyfasta/records.py#L163-L170

@brentp can you let me know if there's anything else needed for this feature?

@alimanfoo
Copy link
Author

If possible that would be great, thank you.

mdshw5 added a commit that referenced this issue May 11, 2018
mdshw5 added a commit that referenced this issue May 11, 2018
@brentp
Copy link
Contributor

brentp commented May 11, 2018

I don't think anything else is needed. thanks for implementing!

@mdshw5
Copy link
Owner

mdshw5 commented May 12, 2018

No problem. I added support in the current master branch, but still have to figure out python3 buffer interface compatibility. It works in python 2.7 currently, so if that's what you're using you can test it out like this:

pip install -e git+https://github.com/mdshw5/pyfaidx.git#egg=pyfaidx

@mdshw5
Copy link
Owner

mdshw5 commented May 12, 2018

I've figured out python3 compatibility and just pushed a new release. CI should finish in a few minutes and you can then install version 0.5.4, which includes this new feature. Please let me know if it doesn't work as expected and I'll be glad to help further.

@mdshw5 mdshw5 closed this as completed May 12, 2018
@alimanfoo
Copy link
Author

alimanfoo commented May 12, 2018 via email

@alimanfoo
Copy link
Author

Just to say, works like a charm on my mosquito genomes, thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants