Code is too complicated #7

Crazylov3 · 2023-04-21T03:27:43Z

Thank you for sharing your beautiful works.
Do you have any plans to release a simplified version of the source code? I am very stuck to follow the repo

gleize · 2023-04-21T18:54:24Z

Hi @Crazylov3,

Thanks a lot for the feedback. Unfortunately there are no immediate plan to re-write a simplified version of the codebase. However, this could change if there is popular demand for it.

In the meantime, if you could provide a few examples of where you find the code complicated or difficult to follow, I could address those issues one by one. And make sure the documentation / codebase is improved.

CGCooke · 2023-04-22T15:01:55Z

@gleize

Firstly, thank you to the team for the awesome work, and for releasing it under a permissive license.

I would love to create a minimalistic/simplified repo, with only the inference code.

It's great that you have made silk fully reproducible, however I also think that a lot of people (myself included!) just want the "meat".

CGCooke · 2023-04-22T20:46:32Z

@gleize

I've created a stripped down version of the repository here.

https://github.com/CGCooke/silk

@Crazylov3 , you might find this a little easier to get started with.

Thanks again for releasing this.

gleize · 2023-04-23T17:58:35Z

@CGCooke

It's great that you have made silk fully reproducible, however I also think that a lot of people (myself included!) just want the "meat".

Completely understandable.

I've created a stripped down version of the repository here.
https://github.com/CGCooke/silk

Oh wow. Thanks a lot for this.

Crazylov3 · 2023-04-24T01:38:05Z

@CGCooke Thanks a lot

Crazylov3 · 2023-04-24T02:10:03Z

@gleize @CGCooke I have one more question about your decoder, extract keypoint coord of keypoint. Since you didn't use padding then the detector heatmap output has a different shape compared to the input. How do you extract correct keypoint from the heatmap? I read the code, then I realize you have a class "LinearCoordinateMapping" which scales the coord of keypoint from a heatmap by a scale factor and bias. What is the scale factor, bias in this case

gleize · 2023-04-24T04:24:43Z

Hi @Crazylov3,

@gleize @CGCooke Since you didn't use padding then the detector heatmap output has a different shape compared to the input. How do you extract correct keypoint from the heatmap?

We have a function called from_feature_coords_to_image_coords(...) that does this for you (c.f. inference example here). It will automatically reverse the feature coordinates back to the input image coordinates, using the LinearCoordinateMapping class provided by the backbone.

I read the code, then I realize you have a class "LinearCoordinateMapping" which scales the coord of keypoint from a heatmap by a scale factor and bias. What is the scale factor, bias in this case

LinearCoordinateMapping objects can be printed. If you add a print(coord_mapping) after this line (or inspect it using a debugger, which I strongly advise) and run scripts/examples/silk-inference.py, you will get this output :

x <- tensor([1., 1.]) x + tensor([-9., -9.])

This means that the scale factor is 1, and the bias is 9 for the default VGG-4 backbone. On both x and y dimensions.

Crazylov3 · 2023-04-24T04:39:32Z

@gleize thanks for the quick response.
Can you explain to me why the factor is 1.0 and the bias is -9 for the default? If I understand correctly, factor is 1, because VGG-4 backbone doesn't have any down-sample block, and the bias is -9 is somehow related to 9 conv layers (3x3, I believe) which doesn't have padding, so the output will be H - 2x9, W - 2x9. However, I still do not understand why bias is -9

gleize · 2023-04-24T05:19:50Z

Can you explain to me why the factor is 1.0 and the bias is -9 for the default? If I understand correctly, factor is 1, because VGG-4 backbone doesn't have any down-sample block, and the bias is -9 is somehow related to 9 conv layers (3x3, I believe) which doesn't have padding, so the output will be H - 2x9, W - 2x9. However, I still do not understand why bias is -9

Yes that's correct. Is the confusion coming from the sign ? (i.e. being -9 instead of +9)
If that's the case, it's because the linear mapping provided by the backbone goes from input to feature by default.
What we need is the inverse mapping (from feature to input), which we do when calling the .reverse(...) function here.

I should have run print(- coord_mapping) above instead of print(coord_mapping). The - inverts the mapping, displaying :

x <- tensor([1., 1.]) x + tensor([9., 9.])

Sorry for the confusion.

Crazylov3 · 2023-04-24T05:34:51Z

Many thanks

Crazylov3 · 2023-04-24T07:49:39Z

@gleize
Sorry for asking too much @@
I want to know that is it possible to change the backbone a bit, I want to add down-sample blocks, without down-sample the computation is too heavy for my tiny robot.

One more question about "x <- tensor([1., 1.]) x + tensor([9., 9.])". If we do that, it means we never get the key point in the top-left of the image with a margin = 9, doesn't it? I am just curious why you do that instead of interpolating the heatmap to the original image size. Are there any technique details in the training process that force us to do that?

gleize · 2023-04-25T23:24:08Z

Hi @Crazylov3,

Sorry for asking too much @@

No worry.

I want to know that is it possible to change the backbone a bit, I want to add down-sample blocks, without down-sample the computation is too heavy for my tiny robot.

Yes you can. The ResFPN we've tried actually downsamples the input. You can check it out as a reference.
The configuration is here.
The model is here.

One more question about "x <- tensor([1., 1.]) x + tensor([9., 9.])". If we do that, it means we never get the key point in the top-left of the image with a margin = 9, doesn't it?

Yes.

I am just curious why you do that instead of interpolating the heatmap to the original image size. Are there any technique details in the training process that force us to do that?

Interpolating the heatmap would displace the final keypoint positions, making them incorrect.
Keypoints are never found on the border because the convolutions are unpadded. And we do not use padding because it decreases performance (c.f. supplementary B.4 section).

I'm curious why would that be a problem for your task ?

Crazylov3 · 2023-04-26T01:55:24Z

The model backbone is the main problem for my task. I am working with Rockchip using NPU (which is poorly supported). I need the real-time performance of keypoint matching, however, your backbone is too heavy computational (at least for my task), I currently use a backbone almost the same as SuperPoint (Down-sample to H/16 x W/16), description map's shape (256, H/16, W/16), I used 256 channels (float) to convert it to 32 dims (uint8, for faster matching), I use pixel shuffle to convert (256, H/16, W/16) -> (1, H, W) for detector. In short, the decoder (which contains upsample -> hurt the speed) and high resolution in the description map must be avoided in my task. I know using down-sample, without decoder, it hurt performance by a lot. Now I try
to apply your idea to my task. Do you have any advice for me? Many thanks

gleize · 2023-04-26T19:49:47Z

@Crazylov3

[...] Do you have any advice for me? Many thanks

I see. This seems to be a good use case for testing our smallest backbone VGG $\micro$ (c.f. paper for performance statistics).
To enable that backbone, you need to change the code in three places (provided below).
This backbone is quite small, and as you can expect, the accuracy will be lower. However, this might be good enough with proper tuning.

L22

CHECKPOINT_PATH = os.path.join(os.path.dirname(__file__), "../../assets/models/silk/analysis/alpha/pvgg-micro.ckpt")

L37

SILK_BACKBONE = ParametricVGG(
    use_max_pooling=False,
    padding=0,
    normalization_fn=[torch.nn.BatchNorm2d(64)],
    channels=(64,),
)

L65

    model = SiLK(
        in_channels=1,
        backbone=deepcopy(SILK_BACKBONE),
        detection_threshold=SILK_THRESHOLD,
        detection_top_k=SILK_TOP_K,
        nms_dist=nms,
        border_dist=SILK_BORDER,
        default_outputs=default_outputs,
        descriptor_scale_factor=SILK_SCALE_FACTOR,
        padding=0,
        lat_channels=32,
        desc_channels=32,
        feat_channels=64,
    )

gleize · 2023-06-19T18:05:10Z

Answers moved to FAQ. Closing now.

gleize mentioned this issue Apr 25, 2023

About reimplement training code #12

Closed

gleize closed this as completed Jun 19, 2023

gleize mentioned this issue Jan 25, 2024

Is there position shift when applying padding=0? #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code is too complicated #7

Code is too complicated #7

Crazylov3 commented Apr 21, 2023

gleize commented Apr 21, 2023

CGCooke commented Apr 22, 2023

CGCooke commented Apr 22, 2023

gleize commented Apr 23, 2023

Crazylov3 commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023

gleize commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023

gleize commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023 •

edited

Loading

gleize commented Apr 25, 2023

Crazylov3 commented Apr 26, 2023 •

edited

Loading

gleize commented Apr 26, 2023

gleize commented Jun 19, 2023

Code is too complicated #7

Code is too complicated #7

Comments

Crazylov3 commented Apr 21, 2023

gleize commented Apr 21, 2023

CGCooke commented Apr 22, 2023

CGCooke commented Apr 22, 2023

gleize commented Apr 23, 2023

Crazylov3 commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023

gleize commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023

gleize commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023

Crazylov3 commented Apr 24, 2023 • edited Loading

gleize commented Apr 25, 2023

Crazylov3 commented Apr 26, 2023 • edited Loading

gleize commented Apr 26, 2023

gleize commented Jun 19, 2023

Crazylov3 commented Apr 24, 2023 •

edited

Loading

Crazylov3 commented Apr 26, 2023 •

edited

Loading