Fast inference on CPU #29

juggernauts · 2018-07-31T14:51:15Z

Hi Alan,
Feel free to deprioritize it but currently, the inference is slow on CPUs. In a separate ticket, you did implement batching to improve the inference for long text but it still cannot be used in production settings.

alanakbik · 2018-08-01T06:45:29Z

Hi Ankit, yes we are working on speed improvements, but most will not be included in the upcoming release 0.2 (in a few days) which prioritizes GPU inference.

One thing we can include already are smaller models that trade off small amounts of accuracy for greater CPU inference speed. For instance, while the default NER model currently takes 11 seconds for 500 words on my CPU-only laptop, the small model takes only 3 seconds. we measure an F1 score of 92.61 for the small model, which is still state-of-the-art, but a bit below the full model at 93.18.

Would such models be helpful to you? What kind of CPU inference speeds do you require?

juggernauts · 2018-08-01T09:54:54Z

Hi Alan,

Thank you for your reply. 3 seconds for 500 words is about ideal. Most of the text we will be processing will be under 100 words so sub 500 ms speed will be ideal.
A small model will be helpful for real-time use-cases like ours so I will be interested in trying it.

alanakbik · 2018-08-01T09:58:35Z

That's great - we'll add the first batch of CPU models to the upcoming release!

alanakbik · 2018-08-03T16:49:39Z

Release 0.2 adds pre-trained models that are more CPU friendly. Add '-fast' to model name to get models (listed here, only for English models at present). git pull or pip install flair --upgrade to get the newest version!

marcothinnes · 2019-06-18T19:27:23Z

Hi @alanakbik ,

is there a way to use multiple CPU Cores to speed up the inferencing? Thank you very much.

alanakbik · 2019-06-20T09:32:57Z

@mstaschik good question - there is currently no in-built way, but we'd be very interested in any ideas for making it faster on CPU!

XiaoqingNLP · 2019-10-28T05:02:49Z

Is there any way to speed up? My version is the latest version, and then the NER is executed by the document command in Readme. Is there any way to speed up?

pommedeterresautee · 2019-10-28T11:19:25Z

Depends of the representation you use. Representation based on BI-LSTM are slow on CPU. May be others like BytePairEmbeddings may better work? (will be more rapid for sure)
https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/BYTE_PAIR_EMBEDDINGS.md

XiaoqingNLP · 2019-10-29T02:57:14Z

I'm follow the instruction to get entity, and how to speed up? @pommedeterresautee

from flair.data import Sentence
from flair.models import SequenceTagger

# make a sentence
sentence = Sentence('I love Berlin .')

# load the NER tagger
tagger = SequenceTagger.load('ner')

# run NER over sentence
tagger.predict(sentence)

alanakbik · 2019-10-29T10:21:52Z

@PlayDeep you can use the -fast variants of the models, i.e.

# load the NER tagger
tagger = SequenceTagger.load('ner-fast')

Also, you should not predict on sentences one by one, but always pass lists of sentences and set the mini_batch_size to a value that works on your machine.

sentences = tagger.predict(list_of_sentences, mini_batch_size=16)

XiaoqingNLP · 2019-10-31T02:28:05Z

Does ner-fast affect the f1 value?

pommedeterresautee · 2019-10-31T08:30:57Z

It probably will, the representations used are lighter.

alanakbik · 2019-10-31T14:53:22Z

Yes, slightly, the evaluation numbers are listed here:

https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_2_TAGGING.md#list-of-pre-trained-sequence-tagger-models

Interestingly, ner-ontonotes-fast scores a bit better than ner-ontonotes, so here the fast version is also more accurate :)

alanakbik added the enhancement Improving of an existing feature label Aug 1, 2018

alanakbik added the release-0.2 label Aug 1, 2018

alanakbik pushed a commit that referenced this issue Aug 3, 2018

GH-28: added pre-trained model | GH-29: added fast CPU inference models

8838c56

tabergma closed this as completed Aug 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast inference on CPU #29

Fast inference on CPU #29

juggernauts commented Jul 31, 2018

alanakbik commented Aug 1, 2018

juggernauts commented Aug 1, 2018

alanakbik commented Aug 1, 2018

alanakbik commented Aug 3, 2018

marcothinnes commented Jun 18, 2019

alanakbik commented Jun 20, 2019

XiaoqingNLP commented Oct 28, 2019

pommedeterresautee commented Oct 28, 2019

XiaoqingNLP commented Oct 29, 2019

alanakbik commented Oct 29, 2019

XiaoqingNLP commented Oct 31, 2019

pommedeterresautee commented Oct 31, 2019

alanakbik commented Oct 31, 2019

Fast inference on CPU #29

Fast inference on CPU #29

Comments

juggernauts commented Jul 31, 2018

alanakbik commented Aug 1, 2018

juggernauts commented Aug 1, 2018

alanakbik commented Aug 1, 2018

alanakbik commented Aug 3, 2018

marcothinnes commented Jun 18, 2019

alanakbik commented Jun 20, 2019

XiaoqingNLP commented Oct 28, 2019

pommedeterresautee commented Oct 28, 2019

XiaoqingNLP commented Oct 29, 2019

alanakbik commented Oct 29, 2019

XiaoqingNLP commented Oct 31, 2019

pommedeterresautee commented Oct 31, 2019

alanakbik commented Oct 31, 2019