Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Anomaly Detection / One Class Classification #3411

Open
quantarb opened this issue Feb 27, 2024 · 1 comment
Open

[Question]: Anomaly Detection / One Class Classification #3411

quantarb opened this issue Feb 27, 2024 · 1 comment
Labels
question Further information is requested

Comments

@quantarb
Copy link

Question

Can Flair be used to train a classifier with data from only one class to predict the likelihood that new text belongs to that class? I currently utilize a two class classifier that differentiates between my target documents and a random assortment of Wikipedia articles as the second class. However, this method seems wrong, as it requires generating an exhaustive list of counterexamples. I think modeling this as an anomaly detection problem be more appropriate?

@quantarb quantarb added the question Further information is requested label Feb 27, 2024
@helpmefindaname
Copy link
Member

Hi @quantarb
Flair doesn't have a Anomaly Detection model supported. I think the 2-class aproach is already a good solution, if you combine with with a sampling strategy:

  • train a classifier with all positive examples you have + a few negative that you have choosen by hand
  • predict the whole corpus or a subset that is large enough. Sort by the confidence of the model (highest conf for anomaly) and manually label the first N (I would take like 100) as anomaly/not-anomaly.
  • if the new labeled examples contain too many not-anomalies, start at step 1 again.

However if you don't find that sufficient, I suppose you will be happier with aproaches that are not supported here and might do more research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants