You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Got this error when training STFPM on pill category.
Occurred when training from the benchmarking script.
Error
output = self.trainer.call_hook('validation_step_end', *args, **kwargs)
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1230, in call_hook
output = hook_fx(*args, **kwargs)
File "/home/ashwin/anomalib/anomalib/core/model/anomaly_module.py", line 105, in validation_step_end
self.pixel_metrics(val_step_outputs["anomaly_maps"].flatten(), val_step_outputs["mask"].flatten().int())
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/collections.py", line 110, in forward
return {k: m(*args, **m._filter_kwargs(**kwargs)) for k, m in self.items()}
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/collections.py", line 110, in <dictcomp>
return {k: m(*args, **m._filter_kwargs(**kwargs)) for k, m in self.items()}
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/metric.py", line 205, in forward
self._forward_cache = self.compute()
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/metric.py", line 367, in wrapped_func
self._computed = compute(*args, **kwargs)
File "/home/ashwin/anomalib/anomalib/core/metrics/optimal_f1.py", line 38, in compute
precision, recall, thresholds = self.precision_recall_curve.compute()
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/metric.py", line 367, in wrapped_func
self._computed = compute(*args, **kwargs)
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/classification/precision_recall_curve.py", line 148, in comp
ute
return _precision_recall_curve_compute(preds, target, self.num_classes, self.pos_label)
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/functional/classification/precision_recall_curve.py", line 2
60, in _precision_recall_curve_compute
return _precision_recall_curve_compute_single_class(preds, target, pos_label, sample_weights)
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/functional/classification/precision_recall_curve.py", line 1
40, in _precision_recall_curve_compute_single_class
fps, tps, thresholds = _binary_clf_curve(
File "/home/ashwin/miniconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/functional/classification/precision_recall_curve.py", line 3
6, in _binary_clf_curve
desc_score_indices = torch.argsort(preds, descending=True)
RuntimeError: CUDA out of memory. Tried to allocate 5.82 GiB (GPU 0; 23.70 GiB total capacity; 9.28 GiB already allocated; 2.61 GiB free; 19.41 GiB
reserved in total by PyTorch)
Relevant Config
Image size = 256
The text was updated successfully, but these errors were encountered:
Look like the issue that @blakshma is having with the CFlow implementation is quite similar to this if not the same. @djdameln is trying to move the metric computation to the cpu, which would hopefully resolve the issue.
Got this error when training STFPM on
pill
category.Occurred when training from the benchmarking script.
Error
Relevant Config
The text was updated successfully, but these errors were encountered: