Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fasta more dict-like behaviour #156

Merged
merged 7 commits into from
Dec 10, 2019
Merged

Fasta more dict-like behaviour #156

merged 7 commits into from
Dec 10, 2019

Conversation

Maarten-vd-Sande
Copy link
Contributor

It is a very minor thing, but now pyfaidx.Fasta behaves slightly more dictionary-like with .items and .values.

@Maarten-vd-Sande
Copy link
Contributor Author

Should I worry about the CI/appveyor build failures?

@mdshw5
Copy link
Owner

mdshw5 commented Dec 9, 2019

@Maarten-vd-Sande Thanks for the contribution. You shouldn't worry about the Appveyor errors - that integration hasn't been maintained, and I'll likely stop testing on Windows in the future (unless someone wants to figure out the Windows-specific quirks that are preventing the tests from running). One question I have is about the use case for this change. I have no doubt that making the Fasta class act more like a dict is a good idea, but I do not expect that most users will want Fasta.values() to return the IndexRecord instances. If I'm misunderstanding please let me know.

It seems like a more logical return value would be a list of FastaRecord instances, to match the behavior of Fasta.__contains__, Fasta.__iter__ and Fasta.__getitem__.

What I'm proposing then for Fasta.items() would be something like:

>>> from pyfaidx import Fasta
>>> x = Fasta('/Users/matt/Downloads/gencode_vM12.fa')
>>> tuple(zip(x.keys(), x))[:10]
(('dfam_5S', FastaRecord("dfam_5S")), ('dfam_7SK', FastaRecord("dfam_7SK")), ('dfam_7SLRNA', FastaRecord("dfam_7SLRNA")), ('dfam_ACRO1', FastaRecord("dfam_ACRO1")), ('dfam_ALR', FastaRecord("dfam_ALR")), ('dfam_ALRa', FastaRecord("dfam_ALRa")), ('dfam_ALRb', FastaRecord("dfam_ALRb")), ('dfam_AluJb', FastaRecord("dfam_AluJb")), ('dfam_AluJo', FastaRecord("dfam_AluJo")), ('dfam_AluJr', FastaRecord("dfam_AluJr")))

@Maarten-vd-Sande
Copy link
Contributor Author

Thanks for your reply. You are absolutely right! I will change it.

Would you object if I change this to methods in the Fasta class instead of references to the odict instances? I think that's easier to implement, more pythonic, and it took me actually a minute to find where and how the keys() was implemented.

Something like:

class Fasta:
    ...
    def keys(self):
         return self.faidx.index.keys
    def values(self):
         return [fastarecord for fastarecord in self]
    def items(self):
        return tuple(zip(self.keys(), self.values()))

@mdshw5
Copy link
Owner

mdshw5 commented Dec 10, 2019

@Maarten-vd-Sande Sounds good. I was thinking along the same lines, and fully support this change.

@Maarten-vd-Sande
Copy link
Contributor Author

@mdshw5 I haven't really looked into the code thoroughly, but wouldn't it make more sense to just refer to Fasta.records, instead of this:

    def keys(self)
        return self.faidx.index.keys()

    def values(self):
        return [fastarecord for fastarecord in self]

    def items(self):
        return zip(self.keys(), self.values())

to this:

    def keys(self):
        return self.records.keys()

    def values(self):
        return self.records.values()

    def items(self):
        return self.records.items()

This should effectively do the same right? In my opinion the second option is much more clear.

@mdshw5
Copy link
Owner

mdshw5 commented Dec 10, 2019

This makes sense, but I will refactor the section where Fasta.records is initialized so we can repurpose Fasta.keys:

pyfaidx/pyfaidx/__init__.py

Lines 1000 to 1006 in f1668d2

self.keys = self.faidx.index.keys
if not self.mutable:
self.records = dict(
[(rname, FastaRecord(rname, self)) for rname in self.keys()])
elif self.mutable:
self.records = dict([(rname, MutableFastaRecord(rname, self))
for rname in self.keys()])

Copy link
Owner

@mdshw5 mdshw5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can merge these changes.

if not self.mutable:
self.records = dict(
[(rname, FastaRecord(rname, self)) for rname in self.keys()])
self.records = {rname: FastaRecord(rname, self)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! You can ignore my previous comment.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like dictionary key insertion order is the issue, and I will update here to do something similar to what the Fasta.keys method currently relies on:

self.index = OrderedDict()

@@ -1062,6 +1060,15 @@ def get_spliced_seq(self, name, intervals, rc=False):
# len(Sequence.seq) != end - start
return Sequence(name=name, seq=seq, start=None, end=None)

def keys(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this brings clarity to the methods.

@Maarten-vd-Sande
Copy link
Contributor Author

Okay, nice! 🎉

Thanks for your time

@Maarten-vd-Sande
Copy link
Contributor Author

@mdshw5 dont merge yet! The tests are failing for older python versions.

@Maarten-vd-Sande
Copy link
Contributor Author

Perhaps related to dictionary ordering?

https://www.python.org/dev/peps/pep-0520/

@mdshw5
Copy link
Owner

mdshw5 commented Dec 10, 2019

:) Yes, I was just looking into this.

@Maarten-vd-Sande
Copy link
Contributor Author

I don't have time for it today, but I don't mind taking a look at it somewhere this or next week.

@mdshw5 mdshw5 merged commit 94b8980 into mdshw5:master Dec 10, 2019
@mdshw5
Copy link
Owner

mdshw5 commented Dec 10, 2019

I've tagged v0.5.7 for release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants