Add better sentiment analysis model #1503

alanakbik · 2020-04-01T07:22:12Z

With the upcoming #1492 we will add fine-tuneable transformers to Flair, yielding much improved classification performance. We will use the opportunity to replace the current sentiment analysis model in Flair with a better one, trained over more data and with a BERT-style architecture. To add this model, we should:

Add new sentiment analysis datasets to Flair. Currently we have the IMDB and SentEval datasets, but we should add datasets for other domains beside movie reviews.
Train strong model over aggregated sentiment datasets
Add model to Flair for download

… add truncation options

GH-1503: Sentiment Datasets

alanakbik · 2020-05-02T17:04:26Z

I trained a few models for sentiment analysis in different ways:

FastText word embeddings + LSTM - training with SGD with annealing for max 150 epochs
Transformers (BERT and RoBERTa) - fine-tuning with Adam for 5 epochs

In both cases, the best model is selected using holdout DEV data and evaluated on holdout TEST data.

Here are the results:

Model	Test F1
FastText + LSTM	92.59
BERT (base uncased)	93.14
BERT (base cased)	93.06
DistillBERT (base uncased)	92.82
DistillBERT (base cased)	92.69
RoBERTa (base)	93.43
DistilRoBERTa (base)	93.03

alanakbik · 2020-05-02T18:30:06Z

The models were trained over a combination of 5 different sentiment analysis corpora:

corpus = MultiCorpus([
    IMDB(filter_if_longer_than=50),
    SENTEVAL_SST_BINARY(filter_if_longer_than=50),
    SENTEVAL_MR(filter_if_longer_than=50),
    SENTIMENT_140().downsample(0.1, downsample_test=False),
    AMAZON_REVIEWS(filter_if_longer_than=50, memory_mode='partial')
]
)

resulting in a very large training corpus of some 400.000 text data points. There were three classes: POSITIVE, NEGATIVE and NEUTRAL. However, the NEUTRAL class gets lowest scores, see output of RoBERTa model:

MICRO_AVG: acc 0.9562032772030507 - f1-score 0.934304915804576
MACRO_AVG: acc 0.9562032772030507 - f1-score 0.7438650886414663
NEGATIVE   tp: 4457 - fp: 665 - fn: 392 - tn: 20972 - precision: 0.8702 - recall: 0.9192 - accuracy: 0.9601 - f1-score: 0.8940
NEUTRAL    tp: 302 - fp: 191 - fn: 831 - tn: 25162 - precision: 0.6126 - recall: 0.2665 - accuracy: 0.9614 - f1-score: 0.3715
POSITIVE   tp: 19987 - fp: 884 - fn: 517 - tn: 5098 - precision: 0.9576 - recall: 0.9748 - accuracy: 0.9471 - f1-score: 0.9661

This is likely because the NEUTRAL class exists only in one of the datasets (Amazon reviews with 3 stars). I'll rerun with 3 star reviews mapped to NEGATIVE.

GH-1503: change Amazon reviews sentiment preset

GH-1503: add tokenization presets to ClassificationCorpus

alanakbik · 2020-05-14T14:41:44Z

New models were trained with the following multi corpus:

corpus = MultiCorpus([
    IMDB(filter_if_longer_than=50),
    SENTEVAL_SST_BINARY(filter_if_longer_than=50),
    SENTEVAL_MR(filter_if_longer_than=50),
    AMAZON_REVIEWS(filter_if_longer_than=50, memory_mode='partial', split_max=50000, fraction_of_5_star_reviews=12)
]
)

This balances the positive and negative reviews across the Amazon corpus and only uses reviews with 1+2 star ratings as NEGATIVE and 5 star ratings as POSITIVE to get a better signal.

We package a transformer-based model (distilbert) and an RNN-based model (fasttext) trained over this data.

djstrong · 2020-05-14T14:44:46Z

@alanakbik Have you tested that filter_if_longer_than=50 gives better scores? Or it is only for training speed?

alanakbik · 2020-05-14T14:51:04Z

It's for consistency (and training speed) since some datasets like IMDB have data points of very different lengths, but I haven't tested other lengths.

elderpinzon · 2020-08-26T14:24:14Z

@alanakbik why did you end up leaving the RoBERTa model out? I see it here https://nlp.informatik.hu-berlin.de/resources/models/sentiment-curated-roberta/ but it's not available for selection in flair.models.TextClassifier.

I downloaded it and loaded it directly but got the error below when trying to predict labels. Any ideas? Thank you!

`---------------------------------------------------------------------------
ModuleAttributeError Traceback (most recent call last)
in
1 s = flair.data.Sentence('This is a neutral comment. I have no strong opinion otherwise')
----> 2 roberta_sentiment.predict(s)
3 total_sentiment = s.labels
4 total_sentiment

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/models/text_classification_model.py in predict(self, sentences, mini_batch_size, multi_class_prob, verbose, label_name, return_loss, embedding_storage_mode)
221 continue
222
--> 223 scores = self.forward(batch)
224
225 if return_loss:

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/models/text_classification_model.py in forward(self, sentences)
97 def forward(self, sentences):
98
---> 99 self.document_embeddings.embed(sentences)
100
101 embedding_names = self.document_embeddings.get_names()

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/base.py in embed(self, sentences)
58
59 if not everything_embedded or not self.static_embeddings:
---> 60 self._add_embeddings_internal(sentences)
61
62 return sentences

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/document.py in _add_embeddings_internal(self, sentences)
94
95 for batch in sentence_batches:
---> 96 self._add_embeddings_to_sentences(batch)
97
98 return sentences

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/document.py in _add_embeddings_to_sentences(self, sentences)
143 # put encoded batch through transformer model to get all hidden states of all encoder layers
144 hidden_states = self.model(input_ids, attention_mask=mask)[-1] if len(sentences) > 1
--> 145 else self.model(input_ids)[-1]
146
147 # iterate over all subtokenized sentences

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states)
760 encoder_attention_mask=encoder_extended_attention_mask,
761 output_attentions=output_attentions,
--> 762 output_hidden_states=output_hidden_states,
763 )
764 sequence_output = encoder_outputs[0]

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states)
414 all_hidden_states = all_hidden_states + (hidden_states,)
415
--> 416 if getattr(self.config, "gradient_checkpointing", False):
417
418 def create_custom_forward(module):

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in getattr(self, name)
770 return modules[name]
771 raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
--> 772 type(self).name, name))
773
774 def setattr(self, name: str, value: Union[Tensor, 'Module']) -> None:

ModuleAttributeError: 'BertEncoder' object has no attribute 'config'`

alanakbik · 2020-08-26T18:21:50Z

The model was trained with a pre-release version of Flair and there were still some problems with serializing the embeddings. So unfortunately it doesn't work. We packaged a distilbert model instead since the requirements of the RoBERTa model were quite high and distilbert is more realistic for most setups.

alanakbik added a commit that referenced this issue Apr 1, 2020

GH-1503: unify sentiment labels and switch to ClassificationDataset

691df60

alanakbik added a commit that referenced this issue Apr 1, 2020

GH-1503: unify sentiment labels and switch to ClassificationDataset |…

eb3b839

… add truncation options

alanakbik added a commit that referenced this issue Apr 1, 2020

GH-1503: unify sentiment labels and switch to ClassificationDataset |…

c269470

… add truncation options

alanakbik added a commit that referenced this issue Apr 1, 2020

GH-1503: adapt annealing logic for transformers

e787065

alanakbik added a commit that referenced this issue Apr 4, 2020

GH-1503: add Amazon Product Reviews and Twitter Sentiment datasets

5a95215

alanakbik added a commit that referenced this issue Apr 4, 2020

GH-1503: speed optimizations for calculating label dict

f7d83a0

alanakbik added a commit that referenced this issue Apr 6, 2020

GH-1503: add docstrings for datasets

99604e4

alanakbik added a commit that referenced this issue Apr 26, 2020

Merge branch 'master' into GH-1503-sentiment-datasets

8ad25cd

alanakbik mentioned this issue Apr 26, 2020

GH-1503: Sentiment Datasets #1545

Merged

alanakbik added a commit that referenced this issue Apr 26, 2020

GH-1503: save pre-best-model.pt only if anneal_with_prestarts

68d2725

alanakbik added a commit that referenced this issue Apr 26, 2020

Merge pull request #1545 from flairNLP/GH-1503-sentiment-datasets

8f9dec5

GH-1503: Sentiment Datasets

alanakbik added a commit that referenced this issue May 2, 2020

GH-1503: change Amazon reviews sentiment preset

a932e19

alanakbik added a commit that referenced this issue May 2, 2020

Merge pull request #1569 from flairNLP/GH-1503-sentiment-experiments

cfd107f

GH-1503: change Amazon reviews sentiment preset

alanakbik added a commit that referenced this issue May 5, 2020

GH-1503: add tokenization presets to ClassificationCorpus

4e24b28

alanakbik added a commit that referenced this issue May 5, 2020

Merge pull request #1579 from flairNLP/GH-1503-tokenizer-presets

dadba67

GH-1503: add tokenization presets to ClassificationCorpus

alanakbik added a commit that referenced this issue May 14, 2020

GH-1503: new sentiment models

08d4359

alanakbik mentioned this issue May 14, 2020

New sentiment models #1613

Merged

alanakbik closed this as completed in #1613 May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add better sentiment analysis model #1503

Add better sentiment analysis model #1503

alanakbik commented Apr 1, 2020 •

edited

Loading

alanakbik commented May 2, 2020

alanakbik commented May 2, 2020

alanakbik commented May 14, 2020

djstrong commented May 14, 2020

alanakbik commented May 14, 2020 •

edited

Loading

elderpinzon commented Aug 26, 2020

alanakbik commented Aug 26, 2020

Add better sentiment analysis model #1503

Add better sentiment analysis model #1503

Comments

alanakbik commented Apr 1, 2020 • edited Loading

alanakbik commented May 2, 2020

alanakbik commented May 2, 2020

alanakbik commented May 14, 2020

djstrong commented May 14, 2020

alanakbik commented May 14, 2020 • edited Loading

elderpinzon commented Aug 26, 2020

alanakbik commented Aug 26, 2020

alanakbik commented Apr 1, 2020 •

edited

Loading

alanakbik commented May 14, 2020 •

edited

Loading