Added class-based metrics #164

fsonntag · 2018-10-19T10:40:29Z

With the changes in GH-75, the use of the CoNLL evaluation script was removed.
Nevertheless I found it very useful for evaluation class-based predictions, and GH-75 didn't replace this functionality.
So I added this functionality into the Metrics object.

Please feel free to suggest any changes and improvements.

tabergma · 2018-10-19T13:45:42Z

Great idea!
Unfortunately, your changes are breaking Plotter.plot_training_curves() as the tsv format changed. So we cannot merge your changes yet.
We could also reuse the adapted metric to calculate the class metrics in the text classifier. And the changes should also allow us to add the calculation of the micro and macro average score to the metric class itself. I'll take a close look in the next couple of days.

fsonntag · 2018-10-19T14:59:09Z

Thanks for the feedback!
I've reverted the to_tsv function, as you seem to use it for plotting the process and I think it makes sense to only plot the overall metrics.
I've adjusted the TextClassifierTrainer and removed the calculate_class_metrics and the calculate_micro_avg_metric as they seem to be superfluous now.
So I'm not sure if this all goes with how you have the overall architecture in mind, so please let me know if you have something else in mind!

tabergma

A few more suggestions.

tabergma · 2018-10-22T14:58:39Z

flair/training_utils.py


-    def tp(self):
-        self._tp += 1
+    def tp(self, cls=None):


I would like to rename cls to something more self-explaining, maybe class_name or metric_name. What do you think?

tabergma · 2018-10-22T14:59:52Z

flair/training_utils.py


    def print(self):
        log.info(self)

    @staticmethod
    def tsv_header(prefix=None):
        if prefix:
-            return '{0}_TP\t{0}_TN\t{0}_FP\t{0}_FN\t{0}_PRECISION\t{0}_RECALL\t{0}_F-SCORE\t{0}_ACCURACY'.format(prefix)
+            return 'CLS\t{0}_TP\t{0}_TN\t{0}_FP\t{0}_FN\t{0}_PRECISION\t{0}_RECALL\t{0}_F-SCORE\t{0}_ACCURACY'.format(


The first column of the header CLS\t should be removed.

tabergma · 2018-10-22T15:02:58Z

flair/training_utils.py

+        all_classes = self.get_classes()
+        all_lines = [
+            '{0:<10}\ttp: {1} - fp: {2} - fn: {3} - tn: {4} - precision: {5:.4f} - recall: {6:.4f} - accuracy: {7:.4f} - f1-score: {8:.4f}'.format(
+                MICRO_AVG_METRIC if cls == None else cls,


I would take self.name if cls is None as the metric can be used in many different ways and it does not need to be the micro average of some classes.

tabergma · 2018-10-22T15:06:04Z

flair/training_utils.py

+                                                     self._fns.keys()]])))
+
+        all_classes.sort(key=lambda x: (x is not None, x))
+        return all_classes


I would love to see methods to calculate the micro and macro average for all classes. Do you think you can add those? Otherwise I'll do it later on.

They're added as methods to Metric

tabergma · 2018-10-22T15:08:03Z

flair/trainers/text_classification_trainer.py

@@ -233,8 +229,7 @@ def evaluate(self, sentences: List[Sentence], eval_class_metrics: bool = False,
            batches = [sentences[x:x + mini_batch_size] for x in
                       range(0, len(sentences), mini_batch_size)]

-            y_pred = []
-            y_true = []
+            metric = Metric('')


A metric should get a meaningful name. I would suggest either MIRCO_AVG or the data set type (e.g. TEST, DEV or TRAIN).

tabergma · 2018-10-22T15:13:31Z

flair/trainers/sequence_tagger_trainer.py

+                        metric.fn(tag)
+                    else:
+                        metric.tn()
+                        metric.tn(tag)


The metrics per tag are collected but not printed at all. Do you want to add a print of the test metric after training is done?

True, I added appropriate logging

fsonntag · 2018-10-30T16:59:15Z

Thanks for the feedback! Took me a while to address the issues, but I tackled them all

tabergma

Looks good! Just one minor change is required before we can merge.

tabergma · 2018-11-01T08:44:05Z

flair/trainers/sequence_tagger_trainer.py

+            dataset_name, metric.f_score(), metric.accuracy(), metric.get_tp(), metric.get_fp(),
+            metric.get_fn(), metric.get_tn()))
+        for cls in metric.get_classes():
+            log.info("{0:<4}: f-score {1:.4f} - acc {2:.4f} - tp {3} - fp {4} - fn {5} - tn {6}".format(


As metric.get_classes() returns a list of all classes containing None this currently fails when training a sequence tagger due to

TypeError: unsupported format string passed to NoneType.__format__

I would remove the None class from metric.get_classes(). That case you also don't log the results twice. Or you can use dataset_name in case of cls is None instead of using cls as name.
Fixing this should also fix the failing test.

I removed the None class from get_classes(), as it seems more consistent! Thanks!

tabergma · 2018-11-06T13:28:27Z

Looks great! As we just updated the master branch and improved also our tests, would you mind updating your branch with the current master changes? Just to be sure that all the tests are passing. Thanks!

… .tsv output

Various fixes

fsonntag · 2018-11-06T15:00:27Z

Did a full rebase on the master branch, everything worked fine!

tabergma · 2018-11-07T08:02:22Z

Great! Thanks a lot for improving our metric class!

alanakbik · 2018-11-07T08:52:05Z

@fsonntag thanks for your help!

fsonntag · 2018-11-07T09:47:38Z

You're welcome, thanks for flair :)

fsonntag force-pushed the Class-Metrics branch 3 times, most recently from db0acd4 to 66f7cde Compare October 19, 2018 14:55

tabergma requested changes Oct 22, 2018

View reviewed changes

tabergma requested changes Nov 1, 2018

View reviewed changes

tabergma self-assigned this Nov 1, 2018

fsonntag force-pushed the Class-Metrics branch from 5112cf8 to f745aeb Compare November 5, 2018 13:47

tabergma approved these changes Nov 6, 2018

View reviewed changes

Felix Sonntag added 5 commits November 6, 2018 15:36

Added class-based metrics

370da76

Fixed failing tests

bf5a484

Adjusted text_classification_trainer to new Metric class and reverted…

9155e5e

… .tsv output

Added micro and macro average to Metrics object

472bd74

Various fixes

Removed None class in get_classes()

924df22

fsonntag force-pushed the Class-Metrics branch from f745aeb to 924df22 Compare November 6, 2018 14:36

tabergma merged commit cb21acc into flairNLP:master Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added class-based metrics #164

Added class-based metrics #164

fsonntag commented Oct 19, 2018

tabergma commented Oct 19, 2018 •

edited

Loading

fsonntag commented Oct 19, 2018

tabergma left a comment

tabergma Oct 22, 2018

tabergma Oct 22, 2018

tabergma Oct 22, 2018

tabergma Oct 22, 2018

fsonntag Oct 30, 2018

tabergma Oct 22, 2018

tabergma Oct 22, 2018

fsonntag Oct 30, 2018

fsonntag commented Oct 30, 2018

tabergma left a comment

tabergma Nov 1, 2018 •

edited

Loading

fsonntag Nov 5, 2018

tabergma commented Nov 6, 2018

fsonntag commented Nov 6, 2018

tabergma commented Nov 7, 2018

alanakbik commented Nov 7, 2018

fsonntag commented Nov 7, 2018

Added class-based metrics #164

Added class-based metrics #164

Conversation

fsonntag commented Oct 19, 2018

tabergma commented Oct 19, 2018 • edited Loading

fsonntag commented Oct 19, 2018

tabergma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fsonntag commented Oct 30, 2018

tabergma left a comment

Choose a reason for hiding this comment

tabergma Nov 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tabergma commented Nov 6, 2018

fsonntag commented Nov 6, 2018

tabergma commented Nov 7, 2018

alanakbik commented Nov 7, 2018

fsonntag commented Nov 7, 2018

tabergma commented Oct 19, 2018 •

edited

Loading

tabergma Nov 1, 2018 •

edited

Loading