Seqeval TorchMetrics

Metric description

This is implemention of seqeval in torchmetrics.

seqeval is a Python framework for sequence labeling evaluation. seqeval can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

How to use

Seqeval produces labelling scores along with its sufficient statistics from a source against one or more references.

It takes one mandatory argument:

labels: a list of tags, for example ["LOC", "PER", "ORG"].

It can also take several optional arguments:

suffix (boolean): True if the IOB tag is a suffix (after type) instead of a prefix (before type), False otherwise. The default value is False, i.e. the IOB tag is a prefix (before type).

scheme: the target tagging scheme, which can be one of [IOB1, IOB2, IOE1, IOE2, IOBES, BILOU]. The default value is None.

mode: whether to count correct entity labels with incorrect I/B tags as true positives or not. If you want to only count exact matches, pass mode="strict" and a specific scheme value. The default is None.

stage: prefix for keys in output dict. For example "test". The default is None.

>>> from metric.seqeval_metric import Seqeval
>>> metric = Seqeval(labels=["MISC", "PER"])
>>> predictions = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> references = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> results = metric(predictions, references)
>>> results
    {'MISC_precision': tensor(0.), 'PER_precision': tensor(1.), 'MISC_recall': tensor(0.), 
    'PER_recall': tensor(1.), 'MISC_f1': tensor(0.), 'PER_f1': tensor(1.), 
    'MISC_number': tensor(1), 'PER_number': tensor(1), 'overall_precision': tensor(0.5000), 
    'overall_recall': tensor(0.5000), 'overall_f1': tensor(0.5000)}

Output values

This metric returns a dictionary with a summary of scores for overall and per type:

Overall:

precision: the average (micro) precision, on a scale between 0.0 and 1.0.

recall: the average (micro) recall, on a scale between 0.0 and 1.0.

f1: the average (micro) F1 score, which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.

Per type (e.g. MISC, PER, LOC,...):

precision: precision, on a scale between 0.0 and 1.0.

recall: recall, on a scale between 0.0 and 1.0.

f1: F1 score, on a scale between 0.0 and 1.0.

number: Number of actual positives.

Limitations and bias

seqeval supports following IOB formats (short for inside, outside, beginning) : IOB1, IOB2, IOE1, IOE2, IOBES, IOBES (only in strict mode) and BILOU (only in strict mode).

For more information about IOB formats, refer to the Wikipedia page and the description of the CoNLL-2000 shared task.

Metric value is substituted as 0 when encountering zero division.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
metric		metric
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt
test_metrics.py		test_metrics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seqeval TorchMetrics

Metric description

How to use

Output values

Limitations and bias

Further References

About

Releases

Packages

Languages

rbedyakin/seqeval_torchmetrics

Folders and files

Latest commit

History

Repository files navigation

Seqeval TorchMetrics

Metric description

How to use

Output values

Limitations and bias

Further References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages