-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tagger.predict() is very slow #7
Comments
Hello juggernauts, thanks for posting this! Could you give me some details on how you are passing the string to the parser? Are you using sentence splitting and passing a list of sentences? Or are you putting it all into one Sentence object (which would be an extremely long sentence at 455 words)? Could you perhaps post your entire code so I can reproduce? |
The latest commit now includes automatic batching for parsing lists of sentences. Can you split your text into a list of sentences and try again? |
You were right, I was passing the complete text without splitting it into sentences. With your latest commit, I was able to bring down the time to 18 seconds so I guess it did work.
|
Ok, that's great! Thanks for raising the issue and posting the code! we also expect upcoming releases to further improve on tagging speed - we'll keep you posted! |
Will this method of splitting sentences work with classification as well? |
For full text classification you should not use sentence splitting, i.e. each tweet (or text paragraph you wish to classify) should get its own |
Thanks Alan! It is taking me about 7 seconds to predict a single tweet! Right now I am looping through an array of tweets and then predicting on each one. Any suggestions on speeding this up? I appreciate all your help :) |
Hi @jewl123 yes that seems very slow. You are using a non-GPU setup, correct? You can try to use mini-batching to speed things up - for instance you could pass a lists of 4, 8, 16 or 32 tweets at the same time, like this: classifier = TextClassifier.load('en-sentiment')
# make mini-batch of sentences
sentences = [
Sentence('I love this movie'),
Sentence('I hate this movie'),
Sentence('This movie is great'),
]
# pass mini-batch
classifier.predict(sentences)
for sentence in sentences:
print(sentence) |
Thanks for the quick and concise responses Alan! Yes it's on a CPU but when I use it on the cloud it does not seem much faster. I will implement this now and let you know. |
Ahhhh I just realized that I had the "TextClassifier.load_from_file" call inside of my for loop! I will also try with mini-batches soon to see if it optimizes prediction time and report back. Hope this helps someone later on |
Compared to other deep learning based NER models, tagger.predict() appears to be slow. It took around 70 seconds to parse a string with 455 tokens.
Upon running line profiler, it seems all the time is spent in creating the embeddings
self.embeddings.embed(sentences)
Any ideas why this would be so slow?
The text was updated successfully, but these errors were encountered: