Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast inference on CPU #29

Closed
juggernauts opened this issue Jul 31, 2018 · 13 comments
Closed

Fast inference on CPU #29

juggernauts opened this issue Jul 31, 2018 · 13 comments
Labels
enhancement Improving of an existing feature

Comments

@juggernauts
Copy link

Hi Alan,
Feel free to deprioritize it but currently, the inference is slow on CPUs. In a separate ticket, you did implement batching to improve the inference for long text but it still cannot be used in production settings.

@alanakbik
Copy link
Collaborator

Hi Ankit, yes we are working on speed improvements, but most will not be included in the upcoming release 0.2 (in a few days) which prioritizes GPU inference.

One thing we can include already are smaller models that trade off small amounts of accuracy for greater CPU inference speed. For instance, while the default NER model currently takes 11 seconds for 500 words on my CPU-only laptop, the small model takes only 3 seconds. we measure an F1 score of 92.61 for the small model, which is still state-of-the-art, but a bit below the full model at 93.18.

Would such models be helpful to you? What kind of CPU inference speeds do you require?

@alanakbik alanakbik added the enhancement Improving of an existing feature label Aug 1, 2018
@juggernauts
Copy link
Author

Hi Alan,

Thank you for your reply. 3 seconds for 500 words is about ideal. Most of the text we will be processing will be under 100 words so sub 500 ms speed will be ideal.
A small model will be helpful for real-time use-cases like ours so I will be interested in trying it.

@alanakbik
Copy link
Collaborator

That's great - we'll add the first batch of CPU models to the upcoming release!

@alanakbik
Copy link
Collaborator

Release 0.2 adds pre-trained models that are more CPU friendly. Add '-fast' to model name to get models (listed here, only for English models at present). git pull or pip install flair --upgrade to get the newest version!

@marcothinnes
Copy link

Hi @alanakbik ,

is there a way to use multiple CPU Cores to speed up the inferencing? Thank you very much.

@alanakbik
Copy link
Collaborator

@mstaschik good question - there is currently no in-built way, but we'd be very interested in any ideas for making it faster on CPU!

@XiaoqingNLP
Copy link

Is there any way to speed up? My version is the latest version, and then the NER is executed by the document command in Readme. Is there any way to speed up?

@pommedeterresautee
Copy link
Contributor

Depends of the representation you use. Representation based on BI-LSTM are slow on CPU. May be others like BytePairEmbeddings may better work? (will be more rapid for sure)
https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/BYTE_PAIR_EMBEDDINGS.md

@XiaoqingNLP
Copy link

I'm follow the instruction to get entity, and how to speed up? @pommedeterresautee

from flair.data import Sentence
from flair.models import SequenceTagger

# make a sentence
sentence = Sentence('I love Berlin .')

# load the NER tagger
tagger = SequenceTagger.load('ner')

# run NER over sentence
tagger.predict(sentence)

@alanakbik
Copy link
Collaborator

@PlayDeep you can use the -fast variants of the models, i.e.

# load the NER tagger
tagger = SequenceTagger.load('ner-fast')

Also, you should not predict on sentences one by one, but always pass lists of sentences and set the mini_batch_size to a value that works on your machine.

sentences = tagger.predict(list_of_sentences, mini_batch_size=16)

@XiaoqingNLP
Copy link

Does ner-fast affect the f1 value?

@pommedeterresautee
Copy link
Contributor

It probably will, the representations used are lighter.

@alanakbik
Copy link
Collaborator

Yes, slightly, the evaluation numbers are listed here:

https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_2_TAGGING.md#list-of-pre-trained-sequence-tagger-models

Interestingly, ner-ontonotes-fast scores a bit better than ner-ontonotes, so here the fast version is also more accurate :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving of an existing feature
Projects
None yet
Development

No branches or pull requests

6 participants