Relation Extraction (RE) and refactoring of interfaces #2333

alanakbik · 2021-07-09T10:56:59Z

This PR adds datasets and a model architecture for relation extraction to Flair. This is the first of many new NLP model types that will be added to extend Flair to many new NLP-tasks.

In the course of adding this model and preparing the other models, we are refactoring the interfaces and abstractions in Flair to make it easier to add new models & to cut down on code redundancies. This is an ongoing process and this PR is only the first step.

Specifically, this PR adds:

Relation extraction datasets for CoNLL-04, SemEval 2010 Task 8 and TACRED
A baseline RelationExtractor model (still being worked on but functional for now)
A refactoring of the label logic such that there are now "complex" labels that can include additional information next to the label and confidence. An example is the RelationLabel, which can be added to a Sentence like any other label, but next to label and confidence also includes pointers to the two Spans that are involved in the relation
A refactoring of the evaluation logic such that all Flair classification models now inherit from flair.nn.Classifier and so automatically share the entire evaluation logic. This massively cuts down on redundancies and creates a single evaluation routine shred by all models
the beta parameter is no longer a field of the models - it never really belonged there, rather it will become a parameter to be passed to the unified evaluation routine in a future PR
all Flair models now need to implement the property label_type to indicate the denominator of the label they predict

This PR includes many contributions by @melvelet and @ChristophAlt - thanks a lot!!!

…ort)

melvelet and others added 30 commits June 16, 2021 10:43

add conll04 dataset

9b4e5b8

change connl04 to conll_04

85e38e9

add commented line to fix columns order (currently breaks dataset imp…

4cd769f

…ort)

add extra blank lines to source file, fix dataset import

80e2330

add conll_04 to documentation

daf5f50

make sure that blank lines are only added once

a5427d3

create Relation list in Sentence (unfinished)

3ca5dd9

fix/improve tests

2fc7e26

remove print, improve str conversion

b29b528

formatting

2f0391a

add conll04 dataset

70ab085

change connl04 to conll_04

bef5574

add commented line to fix columns order (currently breaks dataset imp…

d0f25e2

…ort)

add extra blank lines to source file, fix dataset import

025b3cb

make sure that blank lines are only added once

b04537a

create Relation list in Sentence (unfinished)

4d1624d

remove print, improve str conversion

4c392fd

add relation_extraction_model, adjust forward method

affc3e0

fix and simplify forward function

89e3ab7

change _calculate_cost function to relation exatraction

931a631

change _obtain_labels, evaluate & predict

1e09abc

rm test and print lines

01a2101

build relations in corpus object

87f82b5

remove temporary tags, refactor function

d8dd893

make _get_relations_from_tags compatible with non-RE dataset

b515e55

deactivate forward test

0de62fd

Integrate SemEval2010_RE dataset

2a6a5ba

initial commit

d49ba83

fix capitalization

22f0499

fix capitalization

823ae94

alanakbik added 27 commits June 30, 2021 12:57

more evaluation fixes

aae7de5

add dropout

bdb241e

Refactor evaluation interface

7d18f57

Implement augmentation

d4f4fd7

Make dropout parameterizable

6020f12

Make dropout parameterizable

ee2e2bb

Correct evaluation report

c1f2025

Record sentence ID

b9cae93

Handle no frame in UP_ENGLISH

500c6bc

Correct handling of macro-scores if class not in test

d28dee0

Prepare evaluation refactoring

56d67fe

Refactor abstractions

7641c70

further modification of interfaces

97f947e

label names

0567da5

Remove old evaluate methods

d156ead

Remove old evaluate methods

e9c2e7c

Fix unit tests

274dc8e

Fix unit tests

087a6e6

Remove unused

97310ea

Make evaluation robust to errors in corpus

dd6c200

Adapt simple tagger to new interface

fd8c077

Add file outputs to evaluation

b1d9042

Rename to RelationExtractor

11769f4

Rename to RelationExtractor

28460ec

Rename to RelationExtractor

3c17a68

Rename to RelationExtractor

1517eba

Rename to RelationExtractor

ff6e1ef

alanakbik merged commit 8a2946e into master Jul 9, 2021

alanakbik deleted the relation_classification_script branch July 22, 2021 09:13

alanakbik mentioned this pull request Jul 25, 2021

Major refactoring of Model classes (Step 2) #2351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relation Extraction (RE) and refactoring of interfaces #2333

Relation Extraction (RE) and refactoring of interfaces #2333

alanakbik commented Jul 9, 2021 •

edited

Loading

Relation Extraction (RE) and refactoring of interfaces #2333

Relation Extraction (RE) and refactoring of interfaces #2333

Conversation

alanakbik commented Jul 9, 2021 • edited Loading

alanakbik commented Jul 9, 2021 •

edited

Loading