You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 2, 2024. It is now read-only.
Hi @stefan-it , I just download IWSLT 15 English Vietnamese dataset and i saw some blank in both files. So I tried to remove all blank lines with Notepad++. Then I saw the number sentences of train.en and train.vi is not equal, 133168 sents for train.en and 133205 for train.vi
The text was updated successfully, but these errors were encountered:
I checked the training file and a wc -l train.en yields to a line number of 133.317 (both for the train.vi file). I think something is wrong with the Notepad++ display (maybe some issues with line breaks).
But could you just give some examples of empty lines? I'll check it then :)
Hi @stefan-it , I just download IWSLT 15 English Vietnamese dataset and i saw some blank in both files. So I tried to remove all blank lines with Notepad++. Then I saw the number sentences of train.en and train.vi is not equal, 133168 sents for train.en and 133205 for train.vi
The text was updated successfully, but these errors were encountered: