-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance on multimodal data #19
Comments
Hi @tcourat, Sorry for the long delay, I was out for while.
Could you give a bit more information ? How did you measure the performance ? Can you post an example of the image, with resolution, and other relevant information ?
Yes. We do not discredit the use of context aggregation though, we mention in our paper that it might not be suitable in all use cases. We left the possibility of adding CA on top of SiLK as future work.
So, it depends on the use case. One thing to be aware of, regarding keypoint models, is that they are quite sensitive to hyper-parameters tuning. For example, you can see on the IMC challenge how people can get +30% boost on their model (e.g. LoFTR) by simply tuning hyper-parameters for the task. In the case of SiLK, since we have little parameters to tune, you can try to play with those and see if you get any boost :
Yes, but we don't have that component in the codebase right now. It shouldn't be too difficult to add that to the training pipeline though. |
Thanks for you answer @gleize Unfortunately I can't share the data I used to give you an example. The issue is that it wasn't finding any keypoints at all (or only a few) after filtering them using ratio-test or double softmax. I also tried to keep all the keypoints (no filtering) and filter outliers with a RANSAC (as the goal was to estimate an homography) but it did not find a good set of inliers, so I just think there are no good matches at all. Could you point out where I can modify the code to adapt the training of the keypoints ? I am not really at ease with the structure of your code since there are many modules. Thanks ! |
Hi @tcourat,
Have you tried re-training on a dataset that looks "similar" to the images you're testing on ? Based on your initial comment, I suspect there might be a domain gap in your specific case.
Yes. We've added some initial code for that, but haven't tested it with anything other than an simple model. The positional encoding can be modified here. When Please, be aware that this is very experimental code. |
If I want to add a new dataset, what should I add/change ? I suppose I have to create a new dataloader somewhere and also have the suitable .yaml config ? Thanks |
Hi @tcourat, Yes, you can add a new dataset here. You can look at existing dataset (e.g. coco, megadepth) and structure your dataset similarly. The config file will have to point to an existing dataset class, accessible from python (e.g. coco). Once your dataset class, and config file is done, you can select it for training here. |
Answers moved to FAQ. Closing now. |
Thanks for your support and your wonderful work. I have encountered a similar issue. I've configured my dataset to resemble COCO by adjusting the names of images in the annotation files. Despite my efforts in tuning, the results are unsatisfactory. Could you please guide me on fine-tuning the model's weights so that I can retain the pre-trained weights and simply fine-tune them on my dataset? Thanks in advance! |
First, thanks for your great work !
I tried matching images from different modalities (i.e., an image with a real texture and a corresponding image with a synthetic texture from a slightly different point of view) and it didn't work very well. On the other hand, LoFTR handled it quite decently.
I think this comes from the fact that SILK does not use context aggregation, so the keypoints are not dependent on the image pair, and there is also no positional encoding as in transformers, so the keypoints are not position-aware. Furthermore, because the architecture is fully convolutional, it struggles to find global or long-range matches. So, SILK locally matches good key points (when zooming we can see they are similar), but globally we can see that they match two different things and thus the matching is bad.
Do you see any improvements or tricks that would make SILK better in this kind of situation ? Or is it simply not an architecture suitable for this case.
I used the pre-trained model on COCO, so so far I haven't tried to finetune it on my use case. However I am not sure how I can train the model with proper matching learning. Indeed, current training procedure is taking one image and transforms it in order to learn keypoint location and matches. However I'd like the matching procedure to be learned on two images, so that it learns to match between the two modalities. Is it possible in some way or should I stick to methods using CA like LoFTR ?
The text was updated successfully, but these errors were encountered: