You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In eval we currently re-map the labels so that they all have single characters, so that we do not artificially inflate or deflate the edit distance; see #373
It happens because we try to make a "temporary" batch so that we can dynamically determine the dataset's shape attribute from the source returned in the batch.
File "/home/ildefonso/Documents/repos/vocalpy/vak-vocalpy/src/vak/datasets/vocal_dataset.py", line 80, in __init__
tmp_item = self.__getitem__(tmp_x_ind)
File "/home/ildefonso/Documents/repos/vocalpy/vak-vocalpy/src/vak/datasets/vocal_dataset.py", line 94, in __getitem__
lbls_int = [self.labelmap[lbl] for lbl in annot.seq.labels]
File "/home/ildefonso/Documents/repos/vocalpy/vak-vocalpy/src/vak/datasets/vocal_dataset.py", line 94, in <listcomp>
lbls_int = [self.labelmap[lbl] for lbl in annot.seq.labels]
I'm not sure why we didn't hit this bug before, e.g. with canary song labeled with integers.
We don't actually want to change the mapping for this purpose. We only want to do it inside validation_step, when we convert labeled timebins to labels (not when we're converting labels to a vector of labeled timebins, as we do when loading samples for this dataset class). This is what we use the to_labels transform for that is currently an attribute of the class.
So I think the right fix for this is as follows, adhering to the principle of least surprise:
don't remap inside core.eval, to avoid breaking VocalDataset
instead validate the labelmap inside of WindowedFrameClassificationModel.
should we do this in the class method or in the init? My first though is in class method to keep from doing validation and having a verbose init, but that would make it possible to instantiate a WindowedFrameClassificationModel "by hand" that calculates a different segment error rate, which we don't want. So validate there
Add a logger and log that we are doing this re-mapping of labelmap
we want to make it explicit that we have a separate labelmap for eval, so we'll do what we currently do inside core eval, but make this a separate attribute (labelmap_eval) and then use that labelmap with the to_labels transform, that we'll rename to_labelmap_eval to be extra explicit
also document what's been done inside the WindowedFrameClassificationModel docstrings
The text was updated successfully, but these errors were encountered:
NickleDave
changed the title
BUG: eval crashes for learncurve when labelmap has multi-char because it changes labelmap
BUG: eval crashes for learncurve when labelmap has multi-char label, because eval re-maps the labelmap
Jun 5, 2023
…Model only, fix#664 (#665)
* Remove re-mapping of labelmap inside core.eval
* Fix WindowedFrameClassificationModel to re-map labelmap for eval
* Add pytest marker 'slow' to ini_options in pyproject.toml
* Mark tests in test_train as slow
* Mark tests in test_core/test_learncurve as slow
* Add logic in tests/conftest.py to sort tests so 'slow' marks run last, and cli option to turn this on
* Fix TEST_INIT_ARGVALS in test_models/test_tweetynet so we don't create spurious error with integer keys in a labelmap
In eval we currently re-map the labels so that they all have single characters, so that we do not artificially inflate or deflate the edit distance; see #373
https://github.com/vocalpy/vak/blob/a6c43bc1587e248e5002123270be29dd5f6a4c7a/src/vak/core/eval.py#LL148C1-L151C62
But this now causes a crash when we try to load the validation set a few lines later.
vak/src/vak/core/eval.py
Line 163 in a6c43bc
It happens because we try to make a "temporary" batch so that we can dynamically determine the dataset's
shape
attribute from thesource
returned in the batch.vak/src/vak/datasets/vocal_dataset.py
Line 80 in a6c43bc
(Do we still need to do this?)
I'm not sure why we didn't hit this bug before, e.g. with canary song labeled with integers.
We don't actually want to change the mapping for this purpose. We only want to do it inside
validation_step
, when we convert labeled timebins to labels (not when we're converting labels to a vector of labeled timebins, as we do when loading samples for this dataset class). This is what we use theto_labels
transform for that is currently an attribute of the class.So I think the right fix for this is as follows, adhering to the principle of least surprise:
core.eval
, to avoid breaking VocalDatasetlabelmap_eval
) and then use that labelmap with theto_labels
transform, that we'll renameto_labelmap_eval
to be extra explicitThe text was updated successfully, but these errors were encountered: