Skip to content

Commit

Permalink
test(executor): sentencizer trimming spaces
Browse files Browse the repository at this point in the history
Test the functionality of trimming spaces at the beginning
and end of the chunks.
Ignoring chunks with only spaces.
It also works with tabs because the code is using the regex
expression '\s' that includes all blank characters.
  • Loading branch information
guiferviz committed May 17, 2020
1 parent 45c0715 commit 6caca89
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions tests/executors/crafters/nlp/split.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,17 @@ def test_sentencier_en_float_numbers(self):
crafted_chunk_list = sentencizer.craft(raw_bytes, 0)
self.assertEqual(len(crafted_chunk_list), 2)

def test_sentencier_en_trim_spaces(self):
"""
Trimming all spaces at the beginning an end of the chunks.
Keeping extra spaces inside chunks.
Ignoring chunks with only spaces.
"""
sentencizer = Sentencizer()
raw_bytes = b' This , text is... . Amazing !!'
chunks = [i["text"] for i in sentencizer.craft(raw_bytes, 0)]
self.assertListEqual(chunks, ["This , text is", "Amazing"])

def test_sentencier_cn(self):
sentencizer = Sentencizer()
raw_bytes = '今天是个大晴天!安迪回来以后,我们准备去动物园。'.encode('utf8')
Expand Down

0 comments on commit 6caca89

Please sign in to comment.