Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
trieuhl committed Aug 17, 2020
1 parent 62f35dc commit 32f85ea
Showing 1 changed file with 46 additions and 36 deletions.
82 changes: 46 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# DeepEventMine
A model to predict nested events from biomedical texts using our pretrained models.
A deep leanring model to predict named entities, triggers, and nested events from biomedical texts using our pretrained models.

- The model and results are reported in our paper: [DeepEventMine: End-to-end Neural Nested Event Extraction from Biomedical Texts](https://doi.org/10.1093/bioinformatics/btaa540)
- Bioinformatics, 2020.

## Requirements
- Python 3.6.5
- PyTorch (torch==1.1.0 torchvision==0.3.0, cuda92)
- Install Python packages

```bash
pip install -r requirements.txt
```
## Features
- We provide our trained models on the seven biomedical tasks
- Reproduce the results reported in our Bioinformatics paper
- Predict for new data given raw text input or PubMed ID
- Visualize the predicted entities and events on the brat

## Tasks

- DeepEventMine has been trained and evaluated on the following tasks (six BioNLP shared tasks and MLEE).

1. cg: [Cancer Genetics (CG), 2013](http://2013.bionlp-st.org/tasks/cancer-genetics)
2. ge11: [GENIA Event Extraction (GENIA), 2011](http://2011.bionlp-st.org/home/genia-event-extraction-genia)
3. ge13: [GENIA Event Extraction (GENIA), 2013](http://bionlp.dbcls.jp/projects/bionlp-st-ge-2013/wiki/Overview)
Expand All @@ -22,45 +22,60 @@ pip install -r requirements.txt
6. pc: [Pathway Curation (PC), 2013](http://2013.bionlp-st.org/tasks/pathway-curation)
7. mlee: [Multi-Level Event Extraction (MLEE)](http://nactem.ac.uk/MLEE/)

## How to run
## Our trained models and scores

### Prepare data
1. Download corpora
- To download the original data sets from BioNLP shared tasks.
- [task] = cg, pc, ge11, etc
- [Our trained models](https://b2share.eudat.eu/records/80d2de0c57d64419b722dc1afa375f28)
- [Our scores](https://b2share.eudat.eu/api/files/3cf6c1f4-5eed-4ee3-99c5-d99f5f011be3/scores.tar.gz)

```bash
sh download.sh bionlp [task]
```
# Before prediction
## Requirements
- Python 3.6.5
- PyTorch (torch==1.1.0 torchvision==0.3.0, cuda92)
- Install Python packages

2. Preprocess data
- Tokenize texts and prepare data for prediction
```bash
sh preprocess.sh bionlp
pip install -r requirements.txt
```

3. Download pre-trained BERT
## Download pre-trained BERT
- Download SciBERT model from PyTorch AllenNLP

```bash
sh download.sh bert
```

4. Download pre-trained DeepEventMine models
## Download pre-trained DeepEventMine models
- Download the pre-trained DeepEventMine model on a given task

```bash
sh download.sh deepeventmine [task]
```

5. Generate configs
# Predict on the BioNLP tasks

## Prepare data
1. Download corpora
- To download the original data sets from BioNLP shared tasks.
- [task] = cg, pc, ge11, etc

```bash
sh download.sh bionlp [task]
```

2. Preprocess data
- Tokenize texts and prepare data for prediction
```bash
sh preprocess.sh bionlp
```

3. Generate configs
- If using GPU: [gpu] = 0, otherwise: [gpu] = -1
- [task] = cg, pc, etc
```bash
sh run.sh config [task] [gpu]
```

### Predict (BioNLP shared tasks)
## Predict

1. For development and test sets (given gold entities)
- CG task: [task] = cg
Expand All @@ -77,7 +92,7 @@ experiments/[task]/predict-gold-dev/
experiments/[task]/predict-gold-test/
```

### Evaluate (BioNLP shared tasks)
## Evaluate

1. Retrieve the original offsets and create zip format
```bash
Expand All @@ -103,24 +118,19 @@ sh run.sh offset [task] gold test
sh run.sh eval [task] gold dev sp
```

4. Supplemenary data

- [Our trained models](https://b2share.eudat.eu/records/80d2de0c57d64419b722dc1afa375f28)
- [Our scores](https://b2share.eudat.eu/api/files/3cf6c1f4-5eed-4ee3-99c5-d99f5f011be3/scores.tar.gz)

## Predict (with raw text)
# Predict given raw text

- You can prepare raw text by your own, or you can get text given PubMed ID.

### Prepare your own raw text
## Prepare your own raw text

- If you want to predict for your raw text using our trained model for a task ([task] = cg, pc, ge11, etc), put your raw text as the following path

```bash
data/raw-text/[task]/PMID-*.txt
```

### Get text from PubMed ID
## Get text from PubMed ID

1. Installation

Expand All @@ -131,7 +141,7 @@ sh install.sh pubmed
2. Prepare data


### Predict
## Predict

1. Preprocess raw text

Expand All @@ -156,7 +166,7 @@ sh run.sh offset [task] raw text
experiments/[task]/predict-raw-text/ev-last/[task]-brat
```

## Visualization
# Visualization

- Visualize the output using the [brat](http://brat.nlplab.org)

Expand Down Expand Up @@ -195,7 +205,7 @@ sh run.sh brat [task] gold test
brat/brat-v1.3_Crunchy_Frog/data/[task]-brat
```

## Acknowledgements
# Acknowledgements
This work is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
This work is also supported by PRISM (Public/Private R&D Investment Strategic Expansion PrograM).

Expand Down

0 comments on commit 32f85ea

Please sign in to comment.