Skip to content

tdhd/dssg2017

Repository files navigation

Data Science for Social Good 2017

March 2017 DSSG hack for the Deutsche Krebsgesellschaft (DKG).

We aim at building a multi-label model that is able to predict the labels for a given RIS article, that includes features like:

  • abstract
  • title
  • authors
  • ...

Webservice

We've built a django webservice that allows the DKG to interact with our model via RIS file uploads.

The service has the following features at the moment:

  • upload training RIS file - triggers model selection on given data.
  • upload test RIS file - produces keyword predictions for the given articles.

On each of the predictions, a user can give either positive or negative feedback, e.g. add another label or remove a predicted label respectively.

Active learning

In order to improve the mult-label model, the service is able to receive the feedback of a user.

We've implemented different strategies of prioritization for this active learning setting. See this article for a survey of active learning.

Files from the hack weekend

The module cleaning_classification_labels implements a cleaning pipeline for classifications which should be applied to rectify the labels a bit.

Also there is a notebook which we added in the beginning of the hack, features.ipynb, looking at different attributes of the features and also doing initial classification on useful label.

Releases

No releases published

Packages

No packages published