tagger.predict() is very slow #7

juggernauts · 2018-07-17T22:23:54Z

Compared to other deep learning based NER models, tagger.predict() appears to be slow. It took around 70 seconds to parse a string with 455 tokens.

Upon running line profiler, it seems all the time is spent in creating the embeddings

self.embeddings.embed(sentences)

Any ideas why this would be so slow?

The text was updated successfully, but these errors were encountered:

alanakbik · 2018-07-18T08:24:44Z

Hello juggernauts, thanks for posting this!

Could you give me some details on how you are passing the string to the parser? Are you using sentence splitting and passing a list of sentences? Or are you putting it all into one Sentence object (which would be an extremely long sentence at 455 words)?

Could you perhaps post your entire code so I can reproduce?

alanakbik · 2018-07-18T09:57:05Z

The latest commit now includes automatic batching for parsing lists of sentences. Can you split your text into a list of sentences and try again?

juggernauts · 2018-07-18T20:43:19Z

You were right, I was passing the complete text without splitting it into sentences. With your latest commit, I was able to bring down the time to 18 seconds so I guess it did work.
Here's my latest code

from flair.tagging_model import SequenceTagger
import nltk
sent_tokens = nltk.sent_tokenize("My long text of 455 words")
sentence = [Sentence(i) for i in sent_tokens]
tagger = SequenceTagger.load('ner')
tagger.predict(sentence)

alanakbik · 2018-07-19T14:03:51Z

Ok, that's great! Thanks for raising the issue and posting the code! we also expect upcoming releases to further improve on tagging speed - we'll keep you posted!

getshaun24 · 2019-01-22T21:28:49Z

You were right, I was passing the complete text without splitting it into sentences. With your latest commit, I was able to bring down the time to 18 seconds so I guess it did work.
Here's my latest code
from flair.tagging_model import SequenceTagger
import nltk
sent_tokens = nltk.sent_tokenize("My long text of 455 words")
sentence = [Sentence(i) for i in sent_tokens]
tagger = SequenceTagger.load('ner')
tagger.predict(sentence)

Will this method of splitting sentences work with classification as well?
When classifying tweets, would tokenizing sentences fracture the overall meaning of the tweet?

alanakbik · 2019-01-22T22:20:28Z

For full text classification you should not use sentence splitting, i.e. each tweet (or text paragraph you wish to classify) should get its own Sentence object. Since tweets are not long it should be OK, runtime-wise.

getshaun24 · 2019-01-22T22:30:21Z

Thanks Alan!

It is taking me about 7 seconds to predict a single tweet!
This is wayyy to long when predicting over a large set.

Right now I am looping through an array of tweets and then predicting on each one.

Any suggestions on speeding this up?

I appreciate all your help :)

alanakbik · 2019-01-22T22:48:13Z

Hi @jewl123 yes that seems very slow. You are using a non-GPU setup, correct?

You can try to use mini-batching to speed things up - for instance you could pass a lists of 4, 8, 16 or 32 tweets at the same time, like this:

classifier = TextClassifier.load('en-sentiment')

# make mini-batch of sentences
sentences = [
    Sentence('I love this movie'),
    Sentence('I hate this movie'),
    Sentence('This movie is great'),
]

# pass mini-batch
classifier.predict(sentences)

for sentence in sentences:
    print(sentence)

getshaun24 · 2019-01-22T22:58:04Z

Thanks for the quick and concise responses Alan!

Yes it's on a CPU but when I use it on the cloud it does not seem much faster.

I will implement this now and let you know.

getshaun24 · 2019-01-22T23:43:03Z

Ahhhh I just realized that I had the "TextClassifier.load_from_file" call inside of my for loop!
Moving it outside helped tremendously.

I will also try with mini-batches soon to see if it optimizes prediction time and report back.

Hope this helps someone later on

alanakbik closed this as completed Jul 19, 2018

juggernauts mentioned this issue Jul 31, 2018

Fast inference on CPU #29

Closed

alanakbik mentioned this issue Aug 7, 2018

Embedding time cost for charlm_embedding #50

Closed

iamyihwa mentioned this issue Dec 10, 2018

'list' object has no attribute 'embed' when trying to predict with pretrained model #294

Closed

dcabo mentioned this issue Sep 1, 2019

Probar Named Entity Recognition (NER) en el corpus de telediarios civio/verba#10

Closed

Madhu000 mentioned this issue Aug 8, 2022

Fine-tuning t5-base model raises an error #1661

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tagger.predict() is very slow #7

tagger.predict() is very slow #7

juggernauts commented Jul 17, 2018

alanakbik commented Jul 18, 2018

alanakbik commented Jul 18, 2018

juggernauts commented Jul 18, 2018 •

edited

Loading

alanakbik commented Jul 19, 2018

getshaun24 commented Jan 22, 2019

alanakbik commented Jan 22, 2019

getshaun24 commented Jan 22, 2019

alanakbik commented Jan 22, 2019

getshaun24 commented Jan 22, 2019

getshaun24 commented Jan 22, 2019

tagger.predict() is very slow #7

tagger.predict() is very slow #7

Comments

juggernauts commented Jul 17, 2018

alanakbik commented Jul 18, 2018

alanakbik commented Jul 18, 2018

juggernauts commented Jul 18, 2018 • edited Loading

alanakbik commented Jul 19, 2018

getshaun24 commented Jan 22, 2019

alanakbik commented Jan 22, 2019

getshaun24 commented Jan 22, 2019

alanakbik commented Jan 22, 2019

getshaun24 commented Jan 22, 2019

getshaun24 commented Jan 22, 2019

juggernauts commented Jul 18, 2018 •

edited

Loading