Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep fasta header lines unmodified, if ranges are provided #121

Closed
tolot27 opened this issue Jul 25, 2017 · 7 comments
Closed

keep fasta header lines unmodified, if ranges are provided #121

tolot27 opened this issue Jul 25, 2017 · 7 comments
Assignees

Comments

@tolot27
Copy link
Contributor

tolot27 commented Jul 25, 2017

Currently, the range coordinates are appended to the sequence ID, if specified (i. e. contig1:20-200). I want to keep the FASTA header lines unmodified. I tried to fix it by myself but there are some mixtures of long_name and full_name in the source code.

@tolot27
Copy link
Contributor Author

tolot27 commented Jul 25, 2017

What me confuses is that the command line parameter --full-names is passed as argument long_names. I suggest renaming full-names/long_names to full-header, at least in the source code. That would make it easier to add new parameters addressing this issue.

@tolot27 tolot27 changed the title keep fasta header lines ummodified, if ranges are provided keep fasta header lines unmodified, if ranges are provided Jul 25, 2017
mdshw5 added a commit that referenced this issue Jul 25, 2017
@mdshw5
Copy link
Owner

mdshw5 commented Jul 25, 2017

Yeah, the .long_name property for a Sequence object just constructs the seq:start-end (complement) string, while the FastaRecord.long_name property retrieves the entire defline (header) from the fasta file. These are really two distinct actions, and it's unfortunate I've named them identically.

I previously deprecated Sequence.longname in favor of Sequence.long_name, so I'll move Sequence.long_name -> Sequence.fancy_name since that's what I initially wanted to call it anyway.

mdshw5 added a commit that referenced this issue Jul 25, 2017
@mdshw5 mdshw5 self-assigned this Jul 25, 2017
@mdshw5
Copy link
Owner

mdshw5 commented Jul 25, 2017

Is this the behavior you were expecting?

With coordinates:

$ faidx --auto-strand tests/data/genes.fasta 'gi|563317589|dbj|AB821309.1|:100-1'
>gi|563317589|dbj|AB821309.1|:100-1 (complement)
CTAATGTGGTATCCTCAACTAAACTGAAGGAGGGCCGGGCCAGGGACAAGGTTGCCATGGTGACCACGAC
CAGGCAGATGAAACGACCCCAGCTGACCAT

Without coordinates:

$ faidx --no-coords --auto-strand tests/data/genes.fasta 'gi|563317589|dbj|AB821309.1|:100-1'
>gi|563317589|dbj|AB821309.1|
CTAATGTGGTATCCTCAACTAAACTGAAGGAGGGCCGGGCCAGGGACAAGGTTGCCATGGTGACCACGAC
CAGGCAGATGAAACGACCCCAGCTGACCAT

@tolot27
Copy link
Contributor Author

tolot27 commented Jul 25, 2017

I'm unsure ich that's wächst Ihr expect. I just want an option which does not modify the ID or better, not changing the ID should be the default behavior as it was with some previous versions.
It looks like your second example with the --no-coords parameter removes already present coordinates.
That raises the question what happens if a sequence identifier has a "coordinate suffix" and I extract a sequence range. Will than an additional coordinate set appended?

@tolot27 tolot27 closed this as completed Jul 25, 2017
@tolot27 tolot27 reopened this Jul 25, 2017
@tolot27
Copy link
Contributor Author

tolot27 commented Jul 25, 2017

Okay, your commit does not remove already existing coordinates.
Regarding to my second concern of appending additional coordinates: this supports my suggestion that no-coords should be the default. Otherwise it would become a bit confusing if someone extracts nested ranges.

@mdshw5
Copy link
Owner

mdshw5 commented Jul 25, 2017

I think I’m going to leave the defaults as is. To me it’s quite a bit more confusing to extract a subsequence and have it labeled exactly the same as the original subject sequence. You’re never going to please everyone! Thanks for your contribution!

@mdshw5 mdshw5 closed this as completed Jul 25, 2017
@tolot27
Copy link
Contributor Author

tolot27 commented Jul 26, 2017

Thanks for the really fast implementation and also for merging my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants