Data Science for Social Good 2017

March 2017 DSSG hack for the Deutsche Krebsgesellschaft (DKG).

We aim at building a multi-label model that is able to predict the labels for a given RIS article, that includes features like:

abstract
title
authors
...

Webservice

We've built a django webservice that allows the DKG to interact with our model via RIS file uploads.

The service has the following features at the moment:

upload training RIS file - triggers model selection on given data.
upload test RIS file - produces keyword predictions for the given articles.

On each of the predictions, a user can give either positive or negative feedback, e.g. add another label or remove a predicted label respectively.

Active learning

In order to improve the mult-label model, the service is able to receive the feedback of a user.

We've implemented different strategies of prioritization for this active learning setting. See this article for a survey of active learning.

Files from the hack weekend

truncate classifications, used for removing noisy or underrepresented labels from the dataset
top_level_labels to extract top level label lists from the CSV strings
multilabel_cancer_classification pipeline for multi label classifications
multiabel_test interactive pipeline for multi label classifications, including truncated classifications

The module cleaning_classification_labels implements a cleaning pipeline for classifications which should be applied to rectify the labels a bit.

Also there is a notebook which we added in the beginning of the hack, features.ipynb, looking at different attributes of the features and also doing initial classification on useful label.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
cleaning_classification_labels		cleaning_classification_labels
django		django
model_selection		model_selection
.gitignore		.gitignore
README.md		README.md
TODO.txt		TODO.txt
abstract_topic_vsm.ipynb		abstract_topic_vsm.ipynb
data.ipynb		data.ipynb
feature_transformation_with_ensembles.py		feature_transformation_with_ensembles.py
features.ipynb		features.ipynb
labels.ipynb		labels.ipynb
labels.py		labels.py
multiabel_test.ipynb		multiabel_test.ipynb
multilabel_cancer_classification.py		multilabel_cancer_classification.py
truncate_classifications.py		truncate_classifications.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science for Social Good 2017

Webservice

Active learning

Files from the hack weekend

About

Releases

Packages

Contributors 5

Languages

tdhd/dssg2017

Folders and files

Latest commit

History

Repository files navigation

Data Science for Social Good 2017

Webservice

Active learning

Files from the hack weekend

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages