Poor performance on multimodal data #19

tcourat · 2023-05-16T09:04:22Z

First, thanks for your great work !

I tried matching images from different modalities (i.e., an image with a real texture and a corresponding image with a synthetic texture from a slightly different point of view) and it didn't work very well. On the other hand, LoFTR handled it quite decently.

I think this comes from the fact that SILK does not use context aggregation, so the keypoints are not dependent on the image pair, and there is also no positional encoding as in transformers, so the keypoints are not position-aware. Furthermore, because the architecture is fully convolutional, it struggles to find global or long-range matches. So, SILK locally matches good key points (when zooming we can see they are similar), but globally we can see that they match two different things and thus the matching is bad.

Do you see any improvements or tricks that would make SILK better in this kind of situation ? Or is it simply not an architecture suitable for this case.

I used the pre-trained model on COCO, so so far I haven't tried to finetune it on my use case. However I am not sure how I can train the model with proper matching learning. Indeed, current training procedure is taking one image and transforms it in order to learn keypoint location and matches. However I'd like the matching procedure to be learned on two images, so that it learns to match between the two modalities. Is it possible in some way or should I stick to methods using CA like LoFTR ?

gleize · 2023-05-27T05:24:55Z

Hi @tcourat,

Sorry for the long delay, I was out for while.

[...] and it didn't work very well.

Could you give a bit more information ? How did you measure the performance ? Can you post an example of the image, with resolution, and other relevant information ?
That would be very helpful to understand the weaknesses.

I think this comes from the fact that SILK does not use context aggregation, [...]

Yes. We do not discredit the use of context aggregation though, we mention in our paper that it might not be suitable in all use cases. We left the possibility of adding CA on top of SiLK as future work.

Do you see any improvements or tricks that would make SILK better in this kind of situation ?

So, it depends on the use case. One thing to be aware of, regarding keypoint models, is that they are quite sensitive to hyper-parameters tuning. For example, you can see on the IMC challenge how people can get +30% boost on their model (e.g. LoFTR) by simply tuning hyper-parameters for the task.

In the case of SiLK, since we have little parameters to tune, you can try to play with those and see if you get any boost :

Image size. Too small and positions won't be accurate enough. Too large and the top-k keypoints from both image are less likely to overlap. There is usually a sweet spot.
Top-k. Too small and the keypoints won't be repeatable. Too large and it will cost more memory and computation.
Matching. Using MNN bare, or with ratio-test or double-softmax will also affect performance. On IMC, we used a double-softmax with threshold of 0.99, and temperature of 0.1.
RANSAC/MAGSAC parameters (mostly the inlier threshold and max iterations).

Is it possible in some way or should I stick to methods using CA like LoFTR ?

Yes, but we don't have that component in the codebase right now. It shouldn't be too difficult to add that to the training pipeline though.

tcourat · 2023-05-29T14:04:18Z

Thanks for you answer @gleize

Unfortunately I can't share the data I used to give you an example. The issue is that it wasn't finding any keypoints at all (or only a few) after filtering them using ratio-test or double softmax. I also tried to keep all the keypoints (no filtering) and filter outliers with a RANSAC (as the goal was to estimate an homography) but it did not find a good set of inliers, so I just think there are no good matches at all.

Could you point out where I can modify the code to adapt the training of the keypoints ? I am not really at ease with the structure of your code since there are many modules.

Thanks !

gleize · 2023-05-29T19:59:28Z

Hi @tcourat,

[...], so I just think there are no good matches at all.

Have you tried re-training on a dataset that looks "similar" to the images you're testing on ? Based on your initial comment, I suspect there might be a domain gap in your specific case.

Could you point out where I can modify the code to adapt the training of the keypoints ?

Yes. We've added some initial code for that, but haven't tested it with anything other than an simple model.
The name of the module is called a contextualizer and can be configured here.
It should be a module that takes a pair of descriptors and output a pair of descriptors. See hydra on how to instantiate a module from the config file.

The positional encoding can be modified here.

When contextualizer is set, this code and this code will run, applying the context aggregation on the descriptors, and adding another loss on those descriptors.

Please, be aware that this is very experimental code.

tcourat · 2023-05-31T15:33:18Z

If I want to add a new dataset, what should I add/change ?

I suppose I have to create a new dataloader somewhere and also have the suitable .yaml config ?

Thanks

gleize · 2023-06-02T05:57:46Z

Hi @tcourat,

Yes, you can add a new dataset here. You can look at existing dataset (e.g. coco, megadepth) and structure your dataset similarly. The config file will have to point to an existing dataset class, accessible from python (e.g. coco).

Once your dataset class, and config file is done, you can select it for training here.

gleize · 2023-06-19T18:02:47Z

Answers moved to FAQ. Closing now.

NsPix · 2024-02-29T09:17:10Z

Thanks for your support and your wonderful work. I have encountered a similar issue. I've configured my dataset to resemble COCO by adjusting the names of images in the annotation files. Despite my efforts in tuning, the results are unsatisfactory. Could you please guide me on fine-tuning the model's weights so that I can retain the pre-trained weights and simply fine-tune them on my dataset? Thanks in advance!

gleize closed this as completed Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance on multimodal data #19

Poor performance on multimodal data #19

tcourat commented May 16, 2023 •

edited

Loading

gleize commented May 27, 2023

tcourat commented May 29, 2023

gleize commented May 29, 2023

tcourat commented May 31, 2023

gleize commented Jun 2, 2023

gleize commented Jun 19, 2023

NsPix commented Feb 29, 2024

Poor performance on multimodal data #19

Poor performance on multimodal data #19

Comments

tcourat commented May 16, 2023 • edited Loading

gleize commented May 27, 2023

tcourat commented May 29, 2023

gleize commented May 29, 2023

tcourat commented May 31, 2023

gleize commented Jun 2, 2023

gleize commented Jun 19, 2023

NsPix commented Feb 29, 2024

tcourat commented May 16, 2023 •

edited

Loading