Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2763 - System should ignore spaces before line indexes on GenBank/GenPept sequences import #2769

Merged
merged 1 commit into from
Feb 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions api/tests/integration/ref/formats/genbank_to_seq.py.out
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
*** GenBank/GenPept to Seq***
1844-gen_bank.seq.seq:SUCCEED
1844-gen_pept.seq.seq:SUCCEED
2763-gen_spaces.seq.seq:SUCCEED
3 changes: 3 additions & 0 deletions api/tests/integration/tests/formats/genbank_to_seq.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ def find_diff(a, b):
files = [
{"file": "1844-gen_bank", "seq_type": "PEPTIDE"},
{"file": "1844-gen_pept", "seq_type": "PEPTIDE"},
{"file": "2763-gen_spaces", "seq_type": "DNA"},
]

lib = indigo.loadMonomerLibraryFromFile(
Expand All @@ -36,6 +37,8 @@ def find_diff(a, b):
mol = indigo.loadSequenceFromFile(
os.path.join(root, filename), infile["seq_type"], lib
)
# with open(os.path.join(ref_path, filename), "w") as file:
# file.write(mol.sequence(lib))
with open(os.path.join(ref_path, filename), "r") as file:
seq_ref = file.read()
seq = mol.sequence(lib)
Expand Down
20 changes: 20 additions & 0 deletions api/tests/integration/tests/formats/molecules/2763-gen_spaces.seq
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg
61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct
121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa
181 gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taataaaccg
241 ccacactgtc attattataa ttagaaacag aacgcaaaaa ttatccacta tataattcaa
301 agacgcgaaa aaaaaagaac aacgcgtcat agaacttttg gcaattcgcg tcacaaataa
361 attttggcaa cttatgtttc ctcttcgagc agtactcgag ccctgtctca agaatgtaat
421 aatacccatc gtaggtatgg ttaaagatag catctccaca acctcaaagc tccttgccga
481 gagtcgccct cctttgtcga gtaattttca cttttcatat gagaacttat tttcttattc
541 tttactctca catcctgtag tgattgacac tgcaacagcc accatcacta gaagaacaga
601 acaattactt aatagaaaaa ttatatcttc ctcgaaacga tttcctgctt ccaacatcta
661 cgtatatcaa gaagcattca cttaccatga cacagcttca gatttcatta ttgctgacag
721 ctactatatc actactccat ctagtagtgg ccacgcccta tgaggcatat cctatcggaa
781 aacaataccc cccagtggca agagtcaatg aatcgtttac atttcaaatt tccaatgata
841 cctataaatc gtctgtagac aagacagctc aaataacata caattgcttc gacttaccga
901 gctggctttc gtttgactct agttctagaa cgttctcagg tgaaccttct tctgacttac
961 tatctgatgc gaacaccacg ttgtatttca atgtaatact cgagggtacg gactctgccg
1021 acagcacgtc tttgaacaat acataccaat ttgttgttac aaaccgtcca tccatctcgc
1081 tatcgtcaga tttcaatcta ttggcgttgt taaaaaacta tggttatact aacggcaaaa
1141 acgctctgaa actagatcct aatgaagtct tcaacgtgac ttttgaccgt tcaatgttca
15 changes: 15 additions & 0 deletions api/tests/integration/tests/formats/ref/2763-gen_spaces.seq
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
GATCCTCCATATACAACGGTATCTCCACCTCAGGTTTAGATCTCAACAACGGAACCATTGCCGACATGAGACAGTTAGGT
ATCGTCGAGAGTTACAAGCTAAAACGAGCAGTAGTCAGCTCTGCATCTGAAGCCGCTGAAGTTCTACTAAGGGTGGATAA
CATCATCCGTGCAAGACCAAGAACCGCCAATAGACAACATATGTAACATATTTAGGATATACCTCGAAAATAATAAACCG
CCACACTGTCATTATTATAATTAGAAACAGAACGCAAAAATTATCCACTATATAATTCAAAGACGCGAAAAAAAAAGAAC
AACGCGTCATAGAACTTTTGGCAATTCGCGTCACAAATAAATTTTGGCAACTTATGTTTCCTCTTCGAGCAGTACTCGAG
CCCTGTCTCAAGAATGTAATAATACCCATCGTAGGTATGGTTAAAGATAGCATCTCCACAACCTCAAAGCTCCTTGCCGA
GAGTCGCCCTCCTTTGTCGAGTAATTTTCACTTTTCATATGAGAACTTATTTTCTTATTCTTTACTCTCACATCCTGTAG
TGATTGACACTGCAACAGCCACCATCACTAGAAGAACAGAACAATTACTTAATAGAAAAATTATATCTTCCTCGAAACGA
TTTCCTGCTTCCAACATCTACGTATATCAAGAAGCATTCACTTACCATGACACAGCTTCAGATTTCATTATTGCTGACAG
CTACTATATCACTACTCCATCTAGTAGTGGCCACGCCCTATGAGGCATATCCTATCGGAAAACAATACCCCCCAGTGGCA
AGAGTCAATGAATCGTTTACATTTCAAATTTCCAATGATACCTATAAATCGTCTGTAGACAAGACAGCTCAAATAACATA
CAATTGCTTCGACTTACCGAGCTGGCTTTCGTTTGACTCTAGTTCTAGAACGTTCTCAGGTGAACCTTCTTCTGACTTAC
TATCTGATGCGAACACCACGTTGTATTTCAATGTAATACTCGAGGGTACGGACTCTGCCGACAGCACGTCTTTGAACAAT
ACATACCAATTTGTTGTTACAAACCGTCCATCCATCTCGCTATCGTCAGATTTCAATCTATTGGCGTTGTTAAAAAACTA
TGGTTATACTAACGGCAAAAACGCTCTGAAACTAGATCCTAATGAAGTCTTCAACGTGACTTTTGACCGTTCAATGTTCA
4 changes: 4 additions & 0 deletions core/indigo-core/molecule/src/sequence_loader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ void SequenceLoader::loadSequence(BaseMolecule& mol, SeqType seq_type)

if (start_char)
{
if (ch == ' ' || ch == '\t')
continue; // skip leading whitespaces
if (ch >= NUM_BEGIN && ch < NUM_END)
{
isGenBankPept = true;
Expand Down Expand Up @@ -2015,6 +2017,8 @@ void SequenceLoader::loadSequence(KetDocument& document, SeqType seq_type)

if (start_char)
{
if (ch == ' ' || ch == '\t')
continue; // skip leading whitespaces
if (isdigit(ch))
isGenBankPept = true;
start_char = false;
Expand Down
Loading