From 3199aa1c3d7990a3b6a524b8c60b760d8dee0120 Mon Sep 17 00:00:00 2001 From: Ilya Vorontsov Date: Mon, 24 Jul 2017 09:45:00 +0300 Subject: [PATCH] filter_token calls compactify automatically (see issue #326 and commit 4863040), so I fixed that point in tutorial. --- docs/notebooks/Corpora_and_Vector_Spaces.ipynb | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/notebooks/Corpora_and_Vector_Spaces.ipynb b/docs/notebooks/Corpora_and_Vector_Spaces.ipynb index 6c09bcf052..72bb57b47e 100644 --- a/docs/notebooks/Corpora_and_Vector_Spaces.ipynb +++ b/docs/notebooks/Corpora_and_Vector_Spaces.ipynb @@ -354,7 +354,7 @@ "source": [ "Although the output is the same as for the plain Python list, the corpus is now much more memory friendly, because at most one vector resides in RAM at a time. Your corpus can now be as large as you want.\n", "\n", - "We are going to create the dictionary from the mycorpus.txt file without loading the entire file into memory. Then, we will generate the list of token ids to remove from this dictionary by querying the dictionary for the token ids of the stop words, and by querying the document frequencies dictionary (dictionary.dfs) for token ids that only appear once. Finally, we will filter these token ids out of our dictionary and call dictionary.compactify() to remove the gaps in the token id series." + "We are going to create the dictionary from the mycorpus.txt file without loading the entire file into memory. Then, we will generate the list of token ids to remove from this dictionary by querying the dictionary for the token ids of the stop words, and by querying the document frequencies dictionary (`dictionary.dfs`) for token ids that only appear once. Finally, we will filter these token ids out of our dictionary. Keep in mind that `dictionary.filter_tokens` (and some other functions such as `dictionary.add_document`) will call `dictionary.compactify()` to remove the gaps in the token id series thus enumeration of remaining tokens can be changed." ] }, { @@ -385,9 +385,6 @@ "\n", "# remove stop words and words that appear only once\n", "dictionary.filter_tokens(stop_ids + once_ids)\n", - "\n", - "# remove gaps in id sequence after words that were removed\n", - "dictionary.compactify()\n", "print(dictionary)" ] },