Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better sentiment analysis model #1503

Closed
3 tasks done
alanakbik opened this issue Apr 1, 2020 · 7 comments · Fixed by #1613
Closed
3 tasks done

Add better sentiment analysis model #1503

alanakbik opened this issue Apr 1, 2020 · 7 comments · Fixed by #1613

Comments

@alanakbik
Copy link
Collaborator

alanakbik commented Apr 1, 2020

With the upcoming #1492 we will add fine-tuneable transformers to Flair, yielding much improved classification performance. We will use the opportunity to replace the current sentiment analysis model in Flair with a better one, trained over more data and with a BERT-style architecture. To add this model, we should:

  • Add new sentiment analysis datasets to Flair. Currently we have the IMDB and SentEval datasets, but we should add datasets for other domains beside movie reviews.
  • Train strong model over aggregated sentiment datasets
  • Add model to Flair for download
alanakbik added a commit that referenced this issue Apr 1, 2020
alanakbik added a commit that referenced this issue Apr 1, 2020
alanakbik added a commit that referenced this issue Apr 6, 2020
alanakbik added a commit that referenced this issue Apr 26, 2020
@alanakbik
Copy link
Collaborator Author

I trained a few models for sentiment analysis in different ways:

  1. FastText word embeddings + LSTM - training with SGD with annealing for max 150 epochs
  2. Transformers (BERT and RoBERTa) - fine-tuning with Adam for 5 epochs

In both cases, the best model is selected using holdout DEV data and evaluated on holdout TEST data.

Here are the results:

Model Test F1
FastText + LSTM 92.59
BERT (base uncased) 93.14
BERT (base cased) 93.06
DistillBERT (base uncased) 92.82
DistillBERT (base cased) 92.69
RoBERTa (base) 93.43
DistilRoBERTa (base) 93.03

@alanakbik
Copy link
Collaborator Author

The models were trained over a combination of 5 different sentiment analysis corpora:

corpus = MultiCorpus([
    IMDB(filter_if_longer_than=50),
    SENTEVAL_SST_BINARY(filter_if_longer_than=50),
    SENTEVAL_MR(filter_if_longer_than=50),
    SENTIMENT_140().downsample(0.1, downsample_test=False),
    AMAZON_REVIEWS(filter_if_longer_than=50, memory_mode='partial')
]
)

resulting in a very large training corpus of some 400.000 text data points. There were three classes: POSITIVE, NEGATIVE and NEUTRAL. However, the NEUTRAL class gets lowest scores, see output of RoBERTa model:

MICRO_AVG: acc 0.9562032772030507 - f1-score 0.934304915804576
MACRO_AVG: acc 0.9562032772030507 - f1-score 0.7438650886414663
NEGATIVE   tp: 4457 - fp: 665 - fn: 392 - tn: 20972 - precision: 0.8702 - recall: 0.9192 - accuracy: 0.9601 - f1-score: 0.8940
NEUTRAL    tp: 302 - fp: 191 - fn: 831 - tn: 25162 - precision: 0.6126 - recall: 0.2665 - accuracy: 0.9614 - f1-score: 0.3715
POSITIVE   tp: 19987 - fp: 884 - fn: 517 - tn: 5098 - precision: 0.9576 - recall: 0.9748 - accuracy: 0.9471 - f1-score: 0.9661

This is likely because the NEUTRAL class exists only in one of the datasets (Amazon reviews with 3 stars). I'll rerun with 3 star reviews mapped to NEGATIVE.

alanakbik added a commit that referenced this issue May 2, 2020
alanakbik added a commit that referenced this issue May 5, 2020
GH-1503: add tokenization presets to ClassificationCorpus
alanakbik added a commit that referenced this issue May 14, 2020
@alanakbik
Copy link
Collaborator Author

New models were trained with the following multi corpus:

corpus = MultiCorpus([
    IMDB(filter_if_longer_than=50),
    SENTEVAL_SST_BINARY(filter_if_longer_than=50),
    SENTEVAL_MR(filter_if_longer_than=50),
    AMAZON_REVIEWS(filter_if_longer_than=50, memory_mode='partial', split_max=50000, fraction_of_5_star_reviews=12)
]
)

This balances the positive and negative reviews across the Amazon corpus and only uses reviews with 1+2 star ratings as NEGATIVE and 5 star ratings as POSITIVE to get a better signal.

We package a transformer-based model (distilbert) and an RNN-based model (fasttext) trained over this data.

@djstrong
Copy link
Contributor

@alanakbik Have you tested that filter_if_longer_than=50 gives better scores? Or it is only for training speed?

@alanakbik
Copy link
Collaborator Author

alanakbik commented May 14, 2020

It's for consistency (and training speed) since some datasets like IMDB have data points of very different lengths, but I haven't tested other lengths.

@elderpinzon
Copy link

@alanakbik why did you end up leaving the RoBERTa model out? I see it here https://nlp.informatik.hu-berlin.de/resources/models/sentiment-curated-roberta/ but it's not available for selection in flair.models.TextClassifier.

I downloaded it and loaded it directly but got the error below when trying to predict labels. Any ideas? Thank you!

`---------------------------------------------------------------------------
ModuleAttributeError Traceback (most recent call last)
in
1 s = flair.data.Sentence('This is a neutral comment. I have no strong opinion otherwise')
----> 2 roberta_sentiment.predict(s)
3 total_sentiment = s.labels
4 total_sentiment

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/models/text_classification_model.py in predict(self, sentences, mini_batch_size, multi_class_prob, verbose, label_name, return_loss, embedding_storage_mode)
221 continue
222
--> 223 scores = self.forward(batch)
224
225 if return_loss:

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/models/text_classification_model.py in forward(self, sentences)
97 def forward(self, sentences):
98
---> 99 self.document_embeddings.embed(sentences)
100
101 embedding_names = self.document_embeddings.get_names()

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/base.py in embed(self, sentences)
58
59 if not everything_embedded or not self.static_embeddings:
---> 60 self._add_embeddings_internal(sentences)
61
62 return sentences

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/document.py in _add_embeddings_internal(self, sentences)
94
95 for batch in sentence_batches:
---> 96 self._add_embeddings_to_sentences(batch)
97
98 return sentences

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/document.py in _add_embeddings_to_sentences(self, sentences)
143 # put encoded batch through transformer model to get all hidden states of all encoder layers
144 hidden_states = self.model(input_ids, attention_mask=mask)[-1] if len(sentences) > 1
--> 145 else self.model(input_ids)[-1]
146
147 # iterate over all subtokenized sentences

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states)
760 encoder_attention_mask=encoder_extended_attention_mask,
761 output_attentions=output_attentions,
--> 762 output_hidden_states=output_hidden_states,
763 )
764 sequence_output = encoder_outputs[0]

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states)
414 all_hidden_states = all_hidden_states + (hidden_states,)
415
--> 416 if getattr(self.config, "gradient_checkpointing", False):
417
418 def create_custom_forward(module):

~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in getattr(self, name)
770 return modules[name]
771 raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
--> 772 type(self).name, name))
773
774 def setattr(self, name: str, value: Union[Tensor, 'Module']) -> None:

ModuleAttributeError: 'BertEncoder' object has no attribute 'config'`

@alanakbik
Copy link
Collaborator Author

The model was trained with a pre-release version of Flair and there were still some problems with serializing the embeddings. So unfortunately it doesn't work. We packaged a distilbert model instead since the requirements of the RoBERTa model were quite high and distilbert is more realistic for most setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants