-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About reimplement training code #12
Comments
Hi @Crazylov3,
The pseudo code doesn't enforce any implementation decision. It just represents correspondences, and essentially could be both float or integer. In our code however, they encode integer indices of pixels in flatten images. For example,
It cannot be done, and would be incorrect to interpolate, as I explained here. Why do you need it to be the same size ?
The
We select the positions with top-k heatmap scores, then add 0.5 to their integer indices to center them. |
@gleize just to confirm, do the sparse coordinates returned by the model -after adding 0.5- follow the convention of using (0, 0) as the top-left corner of the top-left pixel of the image? |
Yes, that's correct. |
@gleize The sparse descriptor returned by the model using coordinates after adding 0.5 then use grid_sample or before adding 0.5? |
SiLK doesn't use The sparsification of descriptors can be found here. Incoming positions already have the |
Hi @gleize In my customer, keypoint heatmap has shape (H, W) and descriptor has shape (H/8, W/8). Is it possible for me to use your detection loss idea. Thank you for your time and assistance. |
Hi @Crazylov3,
We treat those cells as "large" pixels in the loss. The random homography gives us the mapping (and therefore the correspondences) between the cells from image 1 to those in image 2. Once we have the correspondences, we can apply the detection loss.
If the backbone down-samples the resolution (only If you feed a (480, 480) image to that detector, you will first get keypoint positions in feature resolution (60, 60). Then the call to If you want |
Thanks for your answer @gleize |
Hi @Crazylov3, We provided the results of ResFPN in the paper (c.f. tab 7). There is indeed a noticeable decrease in performance. There could be ways to improve the ResFPN architecture to make it work better, but we haven't explored that path. |
Hi
I am very interested in your beautiful work.
I want to reimplement the training code to match the requirement of my task. I have some questions about technical details in order to reproduce your results. I would be grateful if you could help me.
What is corr_0, corr_1 in this case? Are they float (Sub-pixel) or integer (Pixel level)? And how can I achieve that?
2. Because the heatmap (of detection head) has shape H - 2x9, W - 2x9. How can I match it to the original image (in both the training process and inference process)? In my natural thought, I will simply interpolate the heatmap to match the original image size, but it seems not the same as your implementation.
3. The output of the model (in sparse mode) returns the value of coord of the keypoint which is not in integer format (Eg. (20.5, 15.5)). What method did you use to extract coord of the key point in this format from the heatmap ()
The text was updated successfully, but these errors were encountered: