-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add better sentiment analysis model #1503
Comments
I trained a few models for sentiment analysis in different ways:
In both cases, the best model is selected using holdout DEV data and evaluated on holdout TEST data. Here are the results:
|
The models were trained over a combination of 5 different sentiment analysis corpora: corpus = MultiCorpus([
IMDB(filter_if_longer_than=50),
SENTEVAL_SST_BINARY(filter_if_longer_than=50),
SENTEVAL_MR(filter_if_longer_than=50),
SENTIMENT_140().downsample(0.1, downsample_test=False),
AMAZON_REVIEWS(filter_if_longer_than=50, memory_mode='partial')
]
) resulting in a very large training corpus of some 400.000 text data points. There were three classes: POSITIVE, NEGATIVE and NEUTRAL. However, the NEUTRAL class gets lowest scores, see output of RoBERTa model: MICRO_AVG: acc 0.9562032772030507 - f1-score 0.934304915804576
MACRO_AVG: acc 0.9562032772030507 - f1-score 0.7438650886414663
NEGATIVE tp: 4457 - fp: 665 - fn: 392 - tn: 20972 - precision: 0.8702 - recall: 0.9192 - accuracy: 0.9601 - f1-score: 0.8940
NEUTRAL tp: 302 - fp: 191 - fn: 831 - tn: 25162 - precision: 0.6126 - recall: 0.2665 - accuracy: 0.9614 - f1-score: 0.3715
POSITIVE tp: 19987 - fp: 884 - fn: 517 - tn: 5098 - precision: 0.9576 - recall: 0.9748 - accuracy: 0.9471 - f1-score: 0.9661 This is likely because the NEUTRAL class exists only in one of the datasets (Amazon reviews with 3 stars). I'll rerun with 3 star reviews mapped to NEGATIVE. |
GH-1503: change Amazon reviews sentiment preset
GH-1503: add tokenization presets to ClassificationCorpus
New models were trained with the following multi corpus: corpus = MultiCorpus([
IMDB(filter_if_longer_than=50),
SENTEVAL_SST_BINARY(filter_if_longer_than=50),
SENTEVAL_MR(filter_if_longer_than=50),
AMAZON_REVIEWS(filter_if_longer_than=50, memory_mode='partial', split_max=50000, fraction_of_5_star_reviews=12)
]
) This balances the positive and negative reviews across the Amazon corpus and only uses reviews with 1+2 star ratings as NEGATIVE and 5 star ratings as POSITIVE to get a better signal. We package a transformer-based model (distilbert) and an RNN-based model (fasttext) trained over this data. |
@alanakbik Have you tested that |
It's for consistency (and training speed) since some datasets like IMDB have data points of very different lengths, but I haven't tested other lengths. |
@alanakbik why did you end up leaving the RoBERTa model out? I see it here https://nlp.informatik.hu-berlin.de/resources/models/sentiment-curated-roberta/ but it's not available for selection in I downloaded it and loaded it directly but got the error below when trying to predict labels. Any ideas? Thank you! `--------------------------------------------------------------------------- ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/models/text_classification_model.py in predict(self, sentences, mini_batch_size, multi_class_prob, verbose, label_name, return_loss, embedding_storage_mode) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/models/text_classification_model.py in forward(self, sentences) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/base.py in embed(self, sentences) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/document.py in _add_embeddings_internal(self, sentences) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/flair/embeddings/document.py in _add_embeddings_to_sentences(self, sentences) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states) ~/.local/share/virtualenvs/elder.pinzon-m94fUwcj/lib/python3.7/site-packages/torch/nn/modules/module.py in getattr(self, name) ModuleAttributeError: 'BertEncoder' object has no attribute 'config'` |
The model was trained with a pre-release version of Flair and there were still some problems with serializing the embeddings. So unfortunately it doesn't work. We packaged a distilbert model instead since the requirements of the RoBERTa model were quite high and distilbert is more realistic for most setups. |
With the upcoming #1492 we will add fine-tuneable transformers to Flair, yielding much improved classification performance. We will use the opportunity to replace the current sentiment analysis model in Flair with a better one, trained over more data and with a BERT-style architecture. To add this model, we should:
The text was updated successfully, but these errors were encountered: