-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-label classification not working? #678
Comments
Hi, The problem is, that no matter what embedding used, and what hyperparameter, training always quickly goes towards 0.000 F1 / acc: Maybe the problem is that it has a high number of labels, some with low frequency (1)? Full code and logs here: https://github.com/gwohlgen/misc/blob/master/classifier__multi-label.ipynb I split it into train/dev/test, eg
Looks fine. Then created corpus etc:
Still looks good:
Finally training:
In training loss improves, but Acc / F1 goes quickly to 0.000. Did anyone else try to train on the fasttext tutorial dataset? With success? Full code and logs here: https://github.com/gwohlgen/misc/blob/master/classifier__multi-label.ipynb |
In order to make sure that flair is not overwhelmed by many low-frequency classes I made a simplified dataset for multi-label classification with only the 30 most frequent classes, and re-did the experiments, see here: https://github.com/gwohlgen/misc/blob/master/classifier__multi-label-simple.ipynb But the same problem persists. :( |
@gwohlgen Have you used the latest |
@stefan-it Hallo Stefan, I used the latest pip version (0.4.1). Is the softmax bug still existing in that version? |
@stefan-it Just cloned the latest version from github, but the problem persists. |
Hello @gwohlgen - thanks for reporting this and thanks in particular for sharing all details to reproduce the experiment. I unfortunately get the same results so something does not seem to be working. We use multi-label classification on a set of internal problems - to double-check I've just rerun the training on one of our multi-label datasets with the current master branch and everything seems to be working. So somehow it does not work on the cooking dataset whereas it works on ours. I'll take a closer look and let you know if I find anything. Please also let us know should you find out anything else. |
I ran into the same issue while using flair for a multi label classification task, the empty labels seems to be due to the confidence value check. It would be good if somebody has a fix, otherwise I can attempt a patch. |
@alanakbik @abishekk92 I would highly appreciate any attempt to solve the problem :) |
@abishekk92 @gwohlgen the confidence value check does not seem to be the problem in this case, although we want to change the check for the next version. But even when removing the check a model trained on the cooking dataset does not predict anything well. We are still looking into why this is the case. It could be a bug, or even general inapplicability of this type of model to this type of task. |
Hi, |
I was feeling really bad because I had developed a big multi label dataset of our domain painstakingly and it was a real let down after days of training when I started getting F1 as 0. I am glad you guys have started looking at the issue proactively. |
Hi @prabhatM .. yes I also hope it will be fixed soon, I am curious to see how well flair works on multilabel classification .. |
Just a quick update: we are still looking into this and some other classification-related issues (see #709). Unfortunately we haven't found the error yet, but fixed a bunch of smaller things and implemented more baselines (PRs coming soon). Hopefully we find out what the problem is soon. |
I am having the same problem! |
I may have a hacky fix. I changed loss function from BCELoss to BCEWithLogitsLoss and used a large positive pos_weight vector to bias the model away from predicting all nulls. My intuition is that there is a huge class imbalance between the labels seen in each sample and the labels not seen in each sample with the later being much much larger so the model may have been converging to a local minimum that just always predicted no labels. At least this is the case in my data, not sure if this will help everyone. |
Hello @collinpu that's interesting - could you provide more details? How/where did you prodive the pos_weight vector? Perhaps we could try this for these problems. |
You initialize BCEWithLogitsLoss with the pos_weight vector you want to use. See https://pytorch.org/docs/stable/nn.html. Something to note is that by biasing the model in this way you need to be careful not to make the pos_weights too large or they will over bias the model and cause it to overpredict the existence of labels. You'll see this if your recall is very high but your precision is low. |
GH-678: Classification improvements
Hi everyone. I was wondering if anyone was still having the issues described above with the mult-label data.. I'm having the same issues as described by @gwohlgen.. I see there was a merge by @alanakbik toward classification improvements. Do we need to make the change suggested by @collinpu manually? Thanks beforehand for any guidance. |
I'm getting empty list of labels too [] when using multiple tags - particularly with lots of tags and short body. |
Hello everyone, I am also facing same issue and getting score 0.0 for multi_label(True) classification. MICRO_AVG: acc 0.0 - f1-score 0.0 Thanks. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I'm having this problem but training with torch's Adam optimizer, using MADGRAD the f1-score just work's fine... |
@alanrios2001 - Would be most grateful if you could post the code snippet illustrating how to set the training to use torch's Adam optimizer using MADGRAD. Best Regards, |
No description provided.
The text was updated successfully, but these errors were encountered: