Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM2 degraded results compared to SAM #93

Open
omrastogi opened this issue Aug 1, 2024 · 5 comments
Open

SAM2 degraded results compared to SAM #93

omrastogi opened this issue Aug 1, 2024 · 5 comments

Comments

@omrastogi
Copy link

SAM

  • Version: vit_l
  • Input type: box input
  • Multimask Output: True

viz_7_sam
viz_13_sam

SAM2

  • Version: large
  • Input type: box input
  • Multimask Output: True

viz_7_sam2
viz_13_sam2

@omrastogi omrastogi changed the title Getting degrading results from Sam2 compared to Sam Segmentations SAM2 degraded results compared to SAM Aug 1, 2024
@heyoeyo
Copy link

heyoeyo commented Aug 2, 2024

It may be that the box is defined backwards, as in the top-left/bottom-right coordinates are reversed...? That might explain why the mask looks reversed. It might also be worth checking the other masks (from multi-mask output), since it may just be that one of them is giving this odd looking result.

From what I've seen, the results from v2 are generally similar to v1, but a bit more prone to weird artifacts. However, the new models scale to larger image sizes using a lot less VRAM than the v1 models, so they can give cleaner/smoother outlines.

@WaterKnight1998
Copy link

I am also seeing worse performance with points prediction

@heyoeyo
Copy link

heyoeyo commented Aug 5, 2024

I am also seeing worse performance with points prediction

From what I've seen, between the different sized SAMv2 models, there can be significant differences in which masks (i.e. whole object, sub-components of object etc.) end up in the different indexes of the multi-mask output.
For example, the 0-th index mask of the large model tends to pick the smallest sub-component around the point prompt, while the same 0-th mask of the base-plus model tends to pick the 'whole' object. So you might be able to get a better result by picking a different mask output.

@rdfong
Copy link

rdfong commented Dec 5, 2024

Screenshot from 2024-12-05 13-53-38

It also tends to output these strange artifact like results as seen here which was not an issue with sam v1, this is using a point prompt

@heyoeyo
Copy link

heyoeyo commented Dec 6, 2024

It could be that there's an issue with the coordinate position mismatching the image size, given that the mask seems to be selecting a different area than the point (or maybe the point is drawn in the wrong spot?).

If that point is correct, you can probably get a better result by using one of the other mask outputs. For example, if you're using the large model, that ground patch is cleanly segmented in the last-most mask (using multi-mask output):
ground_patch_example
The non-multi-mask + 3 multi-mask outputs are shown on the right. You can see in one of the masks (second-last of multi-mask) it gives a patchy mask a bit like the example you showed, so switching to a different one may fix the problem.

The last-most mask isn't always the best though, for example the full car would be segmented by the second-last mask (and the last-most mask is patchy):
car_example

Which mask is best also depends on the model size that's being used (the examples above are only for the v2.1 large model, the results can be very different for other model sizes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants