-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diverged metrics in PatchCore: dropping from 0.99 to 0.44 and 0.03 is rather critical? #74
Comments
@sequoiagrove, @blakshma worked on the patchcore results, where he managed to improve the performance to the following:
To reproduce the numbers you could use this branch, to be merged to development soon after this PR. Here are the qualitative results for the ones you shared above. (screw/test/good/009/png) |
I get those numbers too now. but I wonder of the computation of the numbers is also wrong, because when I look through the results images, it only detects about 1/4 to 1/2 of the defects in the different categories. The pixel F1 = 0.35 seems to decribe best the actual performance. but I know that it is low due to correctly detected defects dont have to overlap perfectly to still be a good result. |
@sequoiagrove unfortunately, the classification results are independent of the segmentation results. Hence, the algo might have very good classification result while the segmentation results are poor in some cases as you have pointed out. We will investigate into this. |
pixel auroc of 0.97 sound like it is good performance, but looking at the masks it is really not useful in a real system, and the auc is a bad metric for quantifying performance. |
yeah, AUC is widely used in academia, but usually not a good metric for industrial applications since it could be misleading. Finding the best threshold from the AUC is not so easy even though we implemented adaptive thresholding mechanism. This is the reason why we also added f1 score to our evaluations. Regarding always looking at the pixel f1 score for evaluation, there is room there for improvement. We haven't optimise our heatmaps from which the predicted masks are generated. Once we do, I agree, pixel f1 would become the standard metric to evaluate the performance. |
diverged metrics: dropping from 0.99 to 0.44 and 0.03 is rather critical?

log images: nice :)
Also padim dropped in performance, but not as crazy.
Here's a patchcore result of "good" parts:
this is padim:

DATALOADER:0 TEST RESULTS
{'image_AUROC': 0.7589669823646545,
'image_F1': 0.8787878751754761,
'pixel_AUROC': 0.9781586527824402,
'pixel_F1': 0.22379672527313232}
Originally posted by @sequoiagrove in #67 (comment)
The text was updated successfully, but these errors were encountered: