BUG: eval crashes for learncurve when labelmap has multi-char label, because eval re-maps the labelmap #664

NickleDave · 2023-06-05T14:53:17Z

In eval we currently re-map the labels so that they all have single characters, so that we do not artificially inflate or deflate the edit distance; see #373

https://github.com/vocalpy/vak/blob/a6c43bc1587e248e5002123270be29dd5f6a4c7a/src/vak/core/eval.py#LL148C1-L151C62

    labelmap_keys = [lbl for lbl in labelmap.keys() if lbl != 'unlabeled']
    if any([len(label) > 1 for label in labelmap_keys]):  # only re-map if necessary
        # (to minimize chance of knock-on bugs)
        labelmap = multi_char_labels_to_single_char(labelmap)

But this now causes a crash when we try to load the validation set a few lines later.

vak/src/vak/core/eval.py

Line 163 in a6c43bc

val_dataset = VocalDataset.from_csv(

It happens because we try to make a "temporary" batch so that we can dynamically determine the dataset's shape attribute from the source returned in the batch.

vak/src/vak/datasets/vocal_dataset.py

Line 80 in a6c43bc

tmp_item = self.__getitem__(tmp_x_ind)

(Do we still need to do this?)

  File "/home/ildefonso/Documents/repos/vocalpy/vak-vocalpy/src/vak/datasets/vocal_dataset.py", line 80, in __init__
    tmp_item = self.__getitem__(tmp_x_ind)
  File "/home/ildefonso/Documents/repos/vocalpy/vak-vocalpy/src/vak/datasets/vocal_dataset.py", line 94, in __getitem__
    lbls_int = [self.labelmap[lbl] for lbl in annot.seq.labels]
  File "/home/ildefonso/Documents/repos/vocalpy/vak-vocalpy/src/vak/datasets/vocal_dataset.py", line 94, in <listcomp>
    lbls_int = [self.labelmap[lbl] for lbl in annot.seq.labels]

I'm not sure why we didn't hit this bug before, e.g. with canary song labeled with integers.

We don't actually want to change the mapping for this purpose. We only want to do it inside validation_step, when we convert labeled timebins to labels (not when we're converting labels to a vector of labeled timebins, as we do when loading samples for this dataset class). This is what we use the to_labels transform for that is currently an attribute of the class.

So I think the right fix for this is as follows, adhering to the principle of least surprise:

don't remap inside core.eval, to avoid breaking VocalDataset
instead validate the labelmap inside of WindowedFrameClassificationModel.
- should we do this in the class method or in the init? My first though is in class method to keep from doing validation and having a verbose init, but that would make it possible to instantiate a WindowedFrameClassificationModel "by hand" that calculates a different segment error rate, which we don't want. So validate there
- Add a logger and log that we are doing this re-mapping of labelmap
we want to make it explicit that we have a separate labelmap for eval, so we'll do what we currently do inside core eval, but make this a separate attribute (labelmap_eval) and then use that labelmap with the to_labels transform, that we'll rename to_labelmap_eval to be extra explicit
also document what's been done inside the WindowedFrameClassificationModel docstrings

The text was updated successfully, but these errors were encountered:

…Model only, fix #664 (#665) * Remove re-mapping of labelmap inside core.eval * Fix WindowedFrameClassificationModel to re-map labelmap for eval * Add pytest marker 'slow' to ini_options in pyproject.toml * Mark tests in test_train as slow * Mark tests in test_core/test_learncurve as slow * Add logic in tests/conftest.py to sort tests so 'slow' marks run last, and cli option to turn this on * Fix TEST_INIT_ARGVALS in test_models/test_tweetynet so we don't create spurious error with integer keys in a labelmap

NickleDave changed the title ~~BUG: eval crashes for learncurve when labelmap has multi-char because it changes labelmap~~ BUG: eval crashes for learncurve when labelmap has multi-char label, because eval re-maps the labelmap Jun 5, 2023

NickleDave mentioned this issue Jun 5, 2023

BUG: Change labelmap for validation step of FrameWindowClassificationModel only, fix #664 #665

Merged

NickleDave closed this as completed in #665 Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: eval crashes for learncurve when labelmap has multi-char label, because eval re-maps the labelmap #664

BUG: eval crashes for learncurve when labelmap has multi-char label, because eval re-maps the labelmap #664

NickleDave commented Jun 5, 2023

BUG: eval crashes for learncurve when labelmap has multi-char label, because eval re-maps the labelmap #664

BUG: eval crashes for learncurve when labelmap has multi-char label, because eval re-maps the labelmap #664

Comments

NickleDave commented Jun 5, 2023