Major refactoring of internal label logic #2645
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes a refactoring to Flair's internal label logic.
In detail:
SpanLabel
,RelationLabel
etc. are removed in favor of a singleLabel
class for all types of labelLabel
now has a pointer to the data point to which it belongs. This means that labels cannot be instantiated without aDataPoint
objectToken
,Span
andRelation
data points not inherit from_PartOfSentence
, a new specialDataPoint
subtype. They now require a pointer to the Sentence object from which they stem. The new logic causes all labels added to a _PartOfSentence also get registered to the Sentence. So instead of previously:you can now just get a span from the sentence and add a label to it directly. It will get registered on the sentence as well.
forward_pass
method ofDefaultClassifier
to return 3 instead of 4 arguments (Sentences no longer needed). It also does away with the unintuitivespawn
logic we no longer need.start_position
andend_position
of datapoints, theirtext
, theirtag
andscore
(if they have only one tag) and anunlabeled_identifier
get_tag
andadd_tag
have been removed from Token in favor of theget_label
andadd_label
method of the parent DataPoint classget_spans
method ofSentence
is back, and a similarget_relations
method addedTokenizer
classes no longer return lists ofToken
, rather lists of strings that theSentence
object converts to tokens, centralizing the offset and whitespace_after detection in one place