Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code is too complicated #7

Closed
Crazylov3 opened this issue Apr 21, 2023 · 15 comments
Closed

Code is too complicated #7

Crazylov3 opened this issue Apr 21, 2023 · 15 comments

Comments

@Crazylov3
Copy link

Thank you for sharing your beautiful works.
Do you have any plans to release a simplified version of the source code? I am very stuck to follow the repo

@gleize
Copy link
Contributor

gleize commented Apr 21, 2023

Hi @Crazylov3,

Thanks a lot for the feedback. Unfortunately there are no immediate plan to re-write a simplified version of the codebase. However, this could change if there is popular demand for it.

In the meantime, if you could provide a few examples of where you find the code complicated or difficult to follow, I could address those issues one by one. And make sure the documentation / codebase is improved.

@CGCooke
Copy link

CGCooke commented Apr 22, 2023

@gleize

Firstly, thank you to the team for the awesome work, and for releasing it under a permissive license.

I would love to create a minimalistic/simplified repo, with only the inference code.

It's great that you have made silk fully reproducible, however I also think that a lot of people (myself included!) just want the "meat".

@CGCooke
Copy link

CGCooke commented Apr 22, 2023

@gleize

I've created a stripped down version of the repository here.

https://github.com/CGCooke/silk

@Crazylov3 , you might find this a little easier to get started with.

Thanks again for releasing this.

@gleize
Copy link
Contributor

gleize commented Apr 23, 2023

@CGCooke

It's great that you have made silk fully reproducible, however I also think that a lot of people (myself included!) just want the "meat".

Completely understandable.

I've created a stripped down version of the repository here.
https://github.com/CGCooke/silk

Oh wow. Thanks a lot for this.

@Crazylov3
Copy link
Author

@CGCooke Thanks a lot

@Crazylov3
Copy link
Author

@gleize @CGCooke I have one more question about your decoder, extract keypoint coord of keypoint. Since you didn't use padding then the detector heatmap output has a different shape compared to the input. How do you extract correct keypoint from the heatmap? I read the code, then I realize you have a class "LinearCoordinateMapping" which scales the coord of keypoint from a heatmap by a scale factor and bias. What is the scale factor, bias in this case

@gleize
Copy link
Contributor

gleize commented Apr 24, 2023

Hi @Crazylov3,

@gleize @CGCooke Since you didn't use padding then the detector heatmap output has a different shape compared to the input. How do you extract correct keypoint from the heatmap?

We have a function called from_feature_coords_to_image_coords(...) that does this for you (c.f. inference example here). It will automatically reverse the feature coordinates back to the input image coordinates, using the LinearCoordinateMapping class provided by the backbone.

I read the code, then I realize you have a class "LinearCoordinateMapping" which scales the coord of keypoint from a heatmap by a scale factor and bias. What is the scale factor, bias in this case

LinearCoordinateMapping objects can be printed. If you add a print(coord_mapping) after this line (or inspect it using a debugger, which I strongly advise) and run scripts/examples/silk-inference.py, you will get this output :

x <- tensor([1., 1.]) x + tensor([-9., -9.])

This means that the scale factor is 1, and the bias is 9 for the default VGG-4 backbone. On both x and y dimensions.

@Crazylov3
Copy link
Author

@gleize thanks for the quick response.
Can you explain to me why the factor is 1.0 and the bias is -9 for the default? If I understand correctly, factor is 1, because VGG-4 backbone doesn't have any down-sample block, and the bias is -9 is somehow related to 9 conv layers (3x3, I believe) which doesn't have padding, so the output will be H - 2x9, W - 2x9. However, I still do not understand why bias is -9

@gleize
Copy link
Contributor

gleize commented Apr 24, 2023

Can you explain to me why the factor is 1.0 and the bias is -9 for the default? If I understand correctly, factor is 1, because VGG-4 backbone doesn't have any down-sample block, and the bias is -9 is somehow related to 9 conv layers (3x3, I believe) which doesn't have padding, so the output will be H - 2x9, W - 2x9. However, I still do not understand why bias is -9

Yes that's correct. Is the confusion coming from the sign ? (i.e. being -9 instead of +9)
If that's the case, it's because the linear mapping provided by the backbone goes from input to feature by default.
What we need is the inverse mapping (from feature to input), which we do when calling the .reverse(...) function here.

I should have run print(- coord_mapping) above instead of print(coord_mapping). The - inverts the mapping, displaying :

x <- tensor([1., 1.]) x + tensor([9., 9.])

Sorry for the confusion.

@Crazylov3
Copy link
Author

Many thanks

@Crazylov3
Copy link
Author

Crazylov3 commented Apr 24, 2023

@gleize
Sorry for asking too much @@
I want to know that is it possible to change the backbone a bit, I want to add down-sample blocks, without down-sample the computation is too heavy for my tiny robot.

One more question about "x <- tensor([1., 1.]) x + tensor([9., 9.])". If we do that, it means we never get the key point in the top-left of the image with a margin = 9, doesn't it? I am just curious why you do that instead of interpolating the heatmap to the original image size. Are there any technique details in the training process that force us to do that?

@gleize
Copy link
Contributor

gleize commented Apr 25, 2023

Hi @Crazylov3,

Sorry for asking too much @@

No worry.

I want to know that is it possible to change the backbone a bit, I want to add down-sample blocks, without down-sample the computation is too heavy for my tiny robot.

Yes you can. The ResFPN we've tried actually downsamples the input. You can check it out as a reference.
The configuration is here.
The model is here.

One more question about "x <- tensor([1., 1.]) x + tensor([9., 9.])". If we do that, it means we never get the key point in the top-left of the image with a margin = 9, doesn't it?

Yes.

I am just curious why you do that instead of interpolating the heatmap to the original image size. Are there any technique details in the training process that force us to do that?

Interpolating the heatmap would displace the final keypoint positions, making them incorrect.
Keypoints are never found on the border because the convolutions are unpadded. And we do not use padding because it decreases performance (c.f. supplementary B.4 section).

I'm curious why would that be a problem for your task ?

@Crazylov3
Copy link
Author

Crazylov3 commented Apr 26, 2023

The model backbone is the main problem for my task. I am working with Rockchip using NPU (which is poorly supported). I need the real-time performance of keypoint matching, however, your backbone is too heavy computational (at least for my task), I currently use a backbone almost the same as SuperPoint (Down-sample to H/16 x W/16), description map's shape (256, H/16, W/16), I used 256 channels (float) to convert it to 32 dims (uint8, for faster matching), I use pixel shuffle to convert (256, H/16, W/16) -> (1, H, W) for detector. In short, the decoder (which contains upsample -> hurt the speed) and high resolution in the description map must be avoided in my task. I know using down-sample, without decoder, it hurt performance by a lot. Now I try
to apply your idea to my task. Do you have any advice for me? Many thanks

@gleize
Copy link
Contributor

gleize commented Apr 26, 2023

@Crazylov3

[...] Do you have any advice for me? Many thanks

I see. This seems to be a good use case for testing our smallest backbone VGG $\micro$ (c.f. paper for performance statistics).
To enable that backbone, you need to change the code in three places (provided below).
This backbone is quite small, and as you can expect, the accuracy will be lower. However, this might be good enough with proper tuning.

L22

CHECKPOINT_PATH = os.path.join(os.path.dirname(__file__), "../../assets/models/silk/analysis/alpha/pvgg-micro.ckpt")

L37

SILK_BACKBONE = ParametricVGG(
    use_max_pooling=False,
    padding=0,
    normalization_fn=[torch.nn.BatchNorm2d(64)],
    channels=(64,),
)

L65

    model = SiLK(
        in_channels=1,
        backbone=deepcopy(SILK_BACKBONE),
        detection_threshold=SILK_THRESHOLD,
        detection_top_k=SILK_TOP_K,
        nms_dist=nms,
        border_dist=SILK_BORDER,
        default_outputs=default_outputs,
        descriptor_scale_factor=SILK_SCALE_FACTOR,
        padding=0,
        lat_channels=32,
        desc_channels=32,
        feat_channels=64,
    )

@gleize
Copy link
Contributor

gleize commented Jun 19, 2023

Answers moved to FAQ. Closing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants