Multi-label classification not working? #678

gwohlgen · 2019-04-20T20:34:53Z

No description provided.

gwohlgen · 2019-04-20T20:53:04Z

Hi,
I am trying to make multi-label classification work with the dataset used in this fasttext tutorial: https://fasttext.cc/docs/en/supervised-tutorial.html.

The problem is, that no matter what embedding used, and what hyperparameter, training always quickly goes towards 0.000 F1 / acc:
2019-04-20 23:31:15,587 EPOCH 4 done: loss 0.0009 - lr 0.1000 - bad epochs 0
2019-04-20 23:31:30,078 DEV : loss 0.00073111 - f-score 0.0000 - acc 0.0000
2019-04-20 23:31:44,417 TEST : loss 0.00073590 - f-score 0.0000 - acc 0.0000

Maybe the problem is that it has a high number of labels, some with low frequency (1)?

Full code and logs here: https://github.com/gwohlgen/misc/blob/master/classifier__multi-label.ipynb

I split it into train/dev/test, eg

bash$ head cooking.train
wohlg@wohlg-XPS:~/itmo/misc/cooking_classification/preprocessed$ head cooking.train 
__label__sauce __label__cheese how much does potato starch affect a cheese sauce recipe ? 
__label__food-safety __label__acidity dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove how do i cover up the white spots on my cast iron stove ? 
__label__restaurant michelin three star restaurant; but if the chef is not there
__label__knife-skills __label__dicing without knife skills ,  how can i quickly and accurately dice vegetables ? 
__label__storage-method __label__equipment __label__bread what ' s the purpose of a bread box ? 
.....

Looks fine.

Then created corpus etc:

from flair.data_fetcher import NLPTaskDataFetcher
from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentLSTMEmbeddings, CharacterEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer
from pathlib import Path

data_path = '/home/wohlg/itmo/misc/cooking_classification/preprocessed'
corpus = NLPTaskDataFetcher.load_classification_corpus(Path(data_path), 
                                                       test_file='cooking.test', 
                                                       dev_file='cooking.valid', 
                                                       train_file='cooking.train')

word_embeddings = [WordEmbeddings('glove'), 
                   FlairEmbeddings('news-forward-fast'), 
                   FlairEmbeddings('news-backward-fast')]

document_embeddings = DocumentLSTMEmbeddings(word_embeddings, 
                                             hidden_size=512, 
                                             reproject_words=True, 
                                             reproject_words_dimension=256)

Still looks good:

print(corpus.obtain_statistics())
TaggedCorpus: 12404 train + 1500 dev + 1500 test sentences
{
    "TRAIN": {
        "dataset": "TRAIN",
        "total_number_of_documents": 12404,
        "number_of_documents_per_class": {
            "sauce": 332,
            "cheese": 235,
            "food-safety": 967,
            "acidity": 33,
            "cast-iron": 111,
....
[all other stats, also for test and dev]

Finally training:

classifier = TextClassifier(document_embeddings, 
                            label_dictionary=corpus.make_label_dictionary(), 
                            multi_label=True)

trainer = ModelTrainer(classifier, corpus)

trainer.train('/tmp', max_epochs=20)

In training loss improves, but Acc / F1 goes quickly to 0.000.
Any finally, predicting with the learned model doesn't work, it just returns and empty set of labels [],
so my guess is that flair for some reason learns to predict an empty label set -- but why?

Did anyone else try to train on the fasttext tutorial dataset? With success?

Full code and logs here: https://github.com/gwohlgen/misc/blob/master/classifier__multi-label.ipynb

gwohlgen · 2019-04-21T07:47:08Z

In order to make sure that flair is not overwhelmed by many low-frequency classes I made a simplified dataset for multi-label classification with only the 30 most frequent classes, and re-did the experiments, see here: https://github.com/gwohlgen/misc/blob/master/classifier__multi-label-simple.ipynb

But the same problem persists. :(

stefan-it · 2019-04-21T10:06:37Z

@gwohlgen Have you used the latest master of flair (recently, there was a softmax bug fix there) 🤔

gwohlgen · 2019-04-21T10:17:00Z

@stefan-it Hallo Stefan, I used the latest pip version (0.4.1). Is the softmax bug still existing in that version?

gwohlgen · 2019-04-21T10:36:27Z

@stefan-it Just cloned the latest version from github, but the problem persists.
Did anyone try multilabel classification with flair? Is there a working example somewhere? That would help a lot to find the problem.

alanakbik · 2019-04-23T15:51:32Z

Hello @gwohlgen - thanks for reporting this and thanks in particular for sharing all details to reproduce the experiment. I unfortunately get the same results so something does not seem to be working.

We use multi-label classification on a set of internal problems - to double-check I've just rerun the training on one of our multi-label datasets with the current master branch and everything seems to be working. So somehow it does not work on the cooking dataset whereas it works on ours. I'll take a closer look and let you know if I find anything. Please also let us know should you find out anything else.

abishekk92 · 2019-04-24T11:51:34Z

I ran into the same issue while using flair for a multi label classification task, the empty labels seems to be due to the confidence value check. It would be good if somebody has a fix, otherwise I can attempt a patch.

gwohlgen · 2019-04-24T13:49:52Z

@alanakbik @abishekk92 I would highly appreciate any attempt to solve the problem :)

alanakbik · 2019-04-26T09:21:51Z

@abishekk92 @gwohlgen the confidence value check does not seem to be the problem in this case, although we want to change the check for the next version. But even when removing the check a model trained on the cooking dataset does not predict anything well. We are still looking into why this is the case. It could be a bug, or even general inapplicability of this type of model to this type of task.

prabhatM · 2019-05-12T14:29:39Z

Hi,
I have struggled with the same problems for last 2 months. Today, I realized I am not the only one. I was losing confidence on myself!!!

prabhatM · 2019-05-12T14:36:18Z

I was feeling really bad because I had developed a big multi label dataset of our domain painstakingly and it was a real let down after days of training when I started getting F1 as 0.

I am glad you guys have started looking at the issue proactively.

gwohlgen · 2019-05-14T12:47:00Z

Hi @prabhatM .. yes I also hope it will be fixed soon, I am curious to see how well flair works on multilabel classification ..

alanakbik · 2019-05-15T09:07:43Z

Just a quick update: we are still looking into this and some other classification-related issues (see #709). Unfortunately we haven't found the error yet, but fixed a bunch of smaller things and implemented more baselines (PRs coming soon). Hopefully we find out what the problem is soon.

collinpu · 2019-05-20T04:15:22Z

I am having the same problem!

collinpu · 2019-05-20T05:30:28Z

I may have a hacky fix. I changed loss function from BCELoss to BCEWithLogitsLoss and used a large positive pos_weight vector to bias the model away from predicting all nulls. My intuition is that there is a huge class imbalance between the labels seen in each sample and the labels not seen in each sample with the later being much much larger so the model may have been converging to a local minimum that just always predicted no labels. At least this is the case in my data, not sure if this will help everyone.

alanakbik · 2019-05-20T08:30:25Z

Hello @collinpu that's interesting - could you provide more details? How/where did you prodive the pos_weight vector? Perhaps we could try this for these problems.

collinpu · 2019-05-20T16:58:57Z

You initialize BCEWithLogitsLoss with the pos_weight vector you want to use. See https://pytorch.org/docs/stable/nn.html.

Something to note is that by biasing the model in this way you need to be careful not to make the pos_weights too large or they will over bias the model and cause it to overpredict the existence of labels. You'll see this if your recall is very high but your precision is low.

GH-678: Classification improvements

bealjm823 · 2019-07-15T15:00:46Z

Hi everyone. I was wondering if anyone was still having the issues described above with the mult-label data.. I'm having the same issues as described by @gwohlgen.. I see there was a merge by @alanakbik toward classification improvements. Do we need to make the change suggested by @collinpu manually? Thanks beforehand for any guidance.

GH-678: Data Samplers

tombburnell · 2019-08-07T11:45:51Z

I'm getting empty list of labels too [] when using multiple tags - particularly with lots of tags and short body.
If I have longer body I am getting results but typically the results are not good and all score just above 0.5.

paragkr007 · 2019-08-10T10:02:59Z

Hello everyone,

I am also facing same issue and getting score 0.0 for multi_label(True) classification.
Hoping that it will be fixed soon.

MICRO_AVG: acc 0.0 - f1-score 0.0
MACRO_AVG: acc 0.0 - f1-score 0.0

Thanks.

stale · 2020-04-29T22:11:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alanrios2001 · 2023-04-27T22:44:41Z

I'm having this problem but training with torch's Adam optimizer, using MADGRAD the f1-score just work's fine...

None-Such · 2023-05-29T03:21:46Z

@alanrios2001 - Would be most grateful if you could post the code snippet illustrating how to set the training to use torch's Adam optimizer using MADGRAD.

Best Regards,

gwohlgen added the bug Something isn't working label Apr 20, 2019

alanakbik pushed a commit that referenced this issue May 2, 2019

GH-678: added one-hot embeddings and nonlinear projection layer

94ad33d

alanakbik pushed a commit that referenced this issue May 20, 2019

GH-678: BCE with logits loss

e4f693e

alanakbik pushed a commit that referenced this issue May 20, 2019

GH-678: automatic detection of multilabel dictionaries

3558405

alanakbik pushed a commit that referenced this issue May 20, 2019

GH-678: onehot embeddings and fine-tune modes for DocumentPoolEmbeddings

6c01d1e

alanakbik pushed a commit that referenced this issue May 21, 2019

GH-678: better module descriptions

1e75830

alanakbik pushed a commit that referenced this issue May 21, 2019

GH-678: skip empty sentences

3223340

alanakbik pushed a commit that referenced this issue May 21, 2019

GH-678: normalize loss by batch number

30720e1

alanakbik pushed a commit that referenced this issue May 21, 2019

GH-678: better module descriptions

7860e0d

alanakbik pushed a commit that referenced this issue May 21, 2019

GH-678: update to newer embeddings

c4c2b62

alanakbik mentioned this issue May 23, 2019

Can I use pandas dataframe in flair text classifier? #704

Closed

alanakbik pushed a commit that referenced this issue May 23, 2019

GH-678: normalize loss by batch number

8bf0ae0

alanakbik pushed a commit that referenced this issue May 23, 2019

GH-678: normalize loss by batch number

4ab1a85

alanakbik pushed a commit that referenced this issue May 23, 2019

GH-678: normalize loss by batch number

8308db7

alanakbik pushed a commit that referenced this issue May 23, 2019

GH-678: normalize loss by batch number

137e06d

alanakbik pushed a commit that referenced this issue May 23, 2019

GH-678: remove loss function weighting

be9a0e2

alanakbik mentioned this issue May 23, 2019

GH-678: Classification improvements #747

Merged

alanakbik pushed a commit that referenced this issue May 23, 2019

GH-678: fix unit test

4588a84

alanakbik pushed a commit that referenced this issue May 24, 2019

Merge pull request #747 from zalandoresearch/GH-678-classification

82a3d7d

GH-678: Classification improvements

TDaudert mentioned this issue May 30, 2019

Textclassifier - Multiple labels during training, prediction on only one #768

Closed

alanakbik pushed a commit that referenced this issue Jul 18, 2019

GH-678: test for ImbalancedClassificationDatasetSampler

f67a974

alanakbik pushed a commit that referenced this issue Jul 18, 2019

GH-678: changes based on code review

36a5638

yosipk added a commit that referenced this issue Jul 19, 2019

Merge pull request #908 from zalandoresearch/samplers

2f04c94

GH-678: Data Samplers

stale bot added the wontfix This will not be worked on label Apr 29, 2020

stale bot closed this as completed May 6, 2020

ianmcampbell mentioned this issue Jul 20, 2022

Unable to Train Multilabel TextClassifer #2869

Closed

None-Such mentioned this issue May 29, 2023

[Question]: How to Train a Multi-label Text Classifier? #3255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-label classification not working? #678

Multi-label classification not working? #678

gwohlgen commented Apr 20, 2019 •

edited

Loading

gwohlgen commented Apr 20, 2019 •

edited

Loading

gwohlgen commented Apr 21, 2019

stefan-it commented Apr 21, 2019

gwohlgen commented Apr 21, 2019 •

edited

Loading

gwohlgen commented Apr 21, 2019

alanakbik commented Apr 23, 2019

abishekk92 commented Apr 24, 2019

gwohlgen commented Apr 24, 2019

alanakbik commented Apr 26, 2019

prabhatM commented May 12, 2019

prabhatM commented May 12, 2019

gwohlgen commented May 14, 2019

alanakbik commented May 15, 2019

collinpu commented May 20, 2019

collinpu commented May 20, 2019

alanakbik commented May 20, 2019

collinpu commented May 20, 2019

bealjm823 commented Jul 15, 2019

tombburnell commented Aug 7, 2019

paragkr007 commented Aug 10, 2019

stale bot commented Apr 29, 2020

alanrios2001 commented Apr 27, 2023

None-Such commented May 29, 2023

Multi-label classification not working? #678

Multi-label classification not working? #678

Comments

gwohlgen commented Apr 20, 2019 • edited Loading

gwohlgen commented Apr 20, 2019 • edited Loading

gwohlgen commented Apr 21, 2019

stefan-it commented Apr 21, 2019

gwohlgen commented Apr 21, 2019 • edited Loading

gwohlgen commented Apr 21, 2019

alanakbik commented Apr 23, 2019

abishekk92 commented Apr 24, 2019

gwohlgen commented Apr 24, 2019

alanakbik commented Apr 26, 2019

prabhatM commented May 12, 2019

prabhatM commented May 12, 2019

gwohlgen commented May 14, 2019

alanakbik commented May 15, 2019

collinpu commented May 20, 2019

collinpu commented May 20, 2019

alanakbik commented May 20, 2019

collinpu commented May 20, 2019

bealjm823 commented Jul 15, 2019

tombburnell commented Aug 7, 2019

paragkr007 commented Aug 10, 2019

stale bot commented Apr 29, 2020

alanrios2001 commented Apr 27, 2023

None-Such commented May 29, 2023

gwohlgen commented Apr 20, 2019 •

edited

Loading

gwohlgen commented Apr 20, 2019 •

edited

Loading

gwohlgen commented Apr 21, 2019 •

edited

Loading