Mapping labs with accepted papers to international conferences on speech and signal processing
This repository contains code to automatically extract authors affiliations from conference papers pdf files. This data is then used to produce a map, available here.
- The
data
folder contains:data/conferences/{conference_name}
: link of the papers pdf files hosted online associated to an identifierdata/papers/{conference_name}.json
: manually transcribed authors affiliations
Papers are manually transcribed when the automatic extraction fails.
Automatically recognized failures are stored in output/{conference_name}/errors.json
.
A glance at the data can also help identifying
- Propose alternative methods to extract information from papers,
- Methods should be as broad as possible, relying or not on external sources of information ;
- Methods should be machine-learning free, in order to keep the carbon footprint of this project as low as possible.
- Manually review the data,
- Correcting coordinates in the
locations.csv
file, or correcting authors affiliations.
- Correcting coordinates in the
- Propose new ways to visualize the data.
- [] Add a filterable table with papers below map
- [] Add papers from ICASSP
- [] Add papers from LREC
- [] Add papers from CALLING
- [] Add statistical analysis of papers collaboration