OpenNRE is a sub-project of OpenSKL, providing an Open-source Neural Relation Extraction toolkit for extracting structured knowledge from plain text, with ATT as key features to consider relation-associated text information.
OpenNRE is an open-source and extensible toolkit that provides a unified framework to implement relation extraction models. We unify the input and output interfaces of different relation extraction models and provide scalable options for each model. The toolkit covers both supervised and distant supervised settings, and is compatible with both conventional neural networks and pre-trained language models.
Relation extraction is a natural language processing (NLP) task aiming at extracting relations (e.g., founder of) between entities (e.g., Bill Gates and Microsoft). For example, from the sentence Bill Gates founded Microsoft, we can extract the relation triple (Bill Gates, founder of, Microsoft).
Relation extraction is a crucial technique in automatic knowledge graph construction. By using relation extraction, we can accumulatively extract new relation facts and expand the knowledge graph, which, as a way for machines to understand the human world, has many downstream applications like question answering, recommender system and search engine. If you want to learn more about neural relation extraction, visit another project of ours (NREPapers).
It's our honor to help you better explore relation extraction with our OpenNRE toolkit! You can refer to our document for more details about this project.
In this toolkit, we support CNN-based relation extraction models including standard CNN and our proposed CNN+ATT. We also implement methods based on pre-trained language models (BERT).
To validate the effectiveness of this toolkit, we employ the Bag-Level Relation Extraction task for evaluation.
We utilize the NYT10 dataset, which is a distantly supervised collection derived from the New York Times corpus and FreeBase. We mainly experiment on CNN-ATT model, which employs instance-level attention and shows superior performance compared with vanilla CNN.
We report AUC and F1 scores of two models. The right two columns marked with (*) indicates the results sourced from Gao et al.(2021) and Lin et al.(2016). The results show that our implementation of CNN-ATT model is slighly better than the original paper, and also confirm the better performance of CNN-ATT over standard CNN model.
Model | AUC | F1 | AUC(Paper *) | F1(Paper *) |
---|---|---|---|---|
CNN | - | - | 0.212 | 0.318 |
CNN-ATT | 0.333 | 0.397 | 0.318 | 0.380 |
We are now working on deploy OpenNRE as a Python package. Coming soon!
Clone the repository from our github page (don't forget to star us!)
git clone https://github.com/thunlp/OpenNRE.git
If it is too slow, you can try
git clone https://github.com/thunlp/OpenNRE.git --depth 1
Then install all the requirements:
pip install -r requirements.txt
Note: Please choose appropriate PyTorch version based on your machine (related to your CUDA version). For details, refer to https://pytorch.org/.
Then install the package with
python setup.py install
If you also want to modify the code, run this:
python setup.py develop
Note that we have excluded all data and pretrain files for fast deployment. You can manually download them by running scripts in the benchmark
and pretrain
folders. For example, if you want to download FewRel dataset, you can run
bash benchmark/download_fewrel.sh
You can go into the benchmark
folder and download datasets using our scripts. We also list some of the information about the datasets in this document. We provide two distantly-supervised datasets with human-annotated test sets, NYT10m and Wiki20m. Check the datasets section for details.
Make sure you have installed OpenNRE as instructed above. Then import our package and load pre-trained models.
>>> import opennre
>>> model = opennre.get_model('wiki80_cnn_softmax')
Note that it may take a few minutes to download checkpoint and data for the first time. Then use infer
to do sentence-level relation extraction
>>> model.infer({'text': 'He was the son of Máel Dúin mac Máele Fithrich, and grandson of the high king Áed Uaridnach (died 612).', 'h': {'pos': (18, 46)}, 't': {'pos': (78, 91)}})
('father', 0.5108704566955566)
You will get the relation result and its confidence score.
If you want to use the model on your GPU, just run
>>> model = model.cuda()
before calling the inference function.
For now, we have the following available models:
wiki80_cnn_softmax
: trained onwiki80
dataset with a CNN encoder.wiki80_bert_softmax
: trained onwiki80
dataset with a BERT encoder.wiki80_bertentity_softmax
: trained onwiki80
dataset with a BERT encoder (using entity representation concatenation).tacred_bert_softmax
: trained onTACRED
dataset with a BERT encoder.tacred_bertentity_softmax
: trained onTACRED
dataset with a BERT encoder (using entity representation concatenation).
You can train your own models on your own data with OpenNRE. In example
folder we give example training codes for supervised RE models and bag-level RE models. You can either use our provided datasets or your own datasets. For example, you can use the following script to train a PCNN-ATT bag-level model on the NYT10 dataset with manual test set. The ATT algorithm is a typical method to combine a bag of sentences for extracting relations between entities.
python example/train_bag_cnn.py \
--metric auc \
--dataset nyt10m \
--batch_size 160 \
--lr 0.1 \
--weight_decay 1e-5 \
--max_epoch 100 \
--max_length 128 \
--seed 42 \
--encoder pcnn \
--aggr att
Or use the following script to train a BERT model on the Wiki80 dataset:
python example/train_supervised_bert.py \
--pretrain_path bert-base-uncased \
--dataset wiki80
We provide many options in the example training code and you can check them out for detailed instructions.
If you find OpenNRE is useful for your research, please consider citing the following papers:
@inproceedings{han-etal-2019-opennre,
title = "{O}pen{NRE}: An Open and Extensible Toolkit for Neural Relation Extraction",
author = "Han, Xu and Gao, Tianyu and Yao, Yuan and Ye, Deming and Liu, Zhiyuan and Sun, Maosong",
booktitle = "Proceedings of EMNLP-IJCNLP: System Demonstrations",
year = "2019",
url = "https://www.aclweb.org/anthology/D19-3029",
doi = "10.18653/v1/D19-3029",
pages = "169--174"
}
This package is mainly contributed by Tianyu Gao, Xu Han, Shulian Cao, Lumin Tang, Yankai Lin, Zhiyuan Liu
OpenSKL project aims to harness the power of both structured knowledge and natural languages via representation learning. All sub-projects of OpenSKL, under the categories of Algorithm, Resource and Application, are as follows.
- Algorithm:
- OpenKE
- ERNIE
- An effective and efficient toolkit for augmenting pre-trained language models with knowledge graph representations.
- OpenNE
- An effective and efficient toolkit for representing nodes in large-scale graphs as embeddings, with TADW as key features to incorporate text attributes of nodes.
- OpenNRE
- Resource:
- The embeddings of large-scale knowledge graphs pre-trained by OpenKE, covering three typical large-scale knowledge graphs: Wikidata, Freebase, and XLORE. The embeddings are free to use under the MIT license, and please click the following link to submit download requests.
- OpenKE-Wikidata
- Wikidata is a free and collaborative database, collecting structured data to provide support for Wikipedia. The original Wikidata contains 20,982,733 entities, 594 relations and 68,904,773 triplets. In particular, Wikidata-5M is the core subgraph of Wikidata, containing 5,040,986 high-frequency entities from Wikidata with their corresponding 927 relations and 24,267,796 triplets.
- TransE version: Knowledge embeddings of Wikidata pre-trained by OpenKE.
- TransR version of Wikidata-5M: Knowledge embeddings of Wikidata-5M pre-trained by OpenKE.
- OpenKE-Freebase
- Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources. Freebase contains 86,054,151 entities, 14,824 relations and 338,586,276 triplets.
- TransE version: Knowledge embeddings of Freebase pre-trained by OpenKE.
- OpenKE-XLORE
- XLORE is one of the most popular Chinese knowledge graphs developed by THUKEG. XLORE contains 10,572,209 entities, 138,581 relations and 35,954,249 triplets.
- TransE version: Knowledge embeddings of XLORE pre-trained by OpenKE.
- Application:
- Knowledge-Plugin
- An effective and efficient toolkit of plug-and-play knowledge injection for pre-trained language models. Knowledge-Plugin is general for all kinds of knowledge graph embeddings mentioned above. In the toolkit, we plug the TransR version of Wikidata-5M into BERT as an example of applications. With the TransR embedding, we enhance the knowledge ability of BERT without fine-tuning the original model, e.g., up to 8% improvement on question answering.
- Knowledge-Plugin