Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Box blur fast filter that could approximate gaussian filter #223

Merged
merged 11 commits into from
Feb 6, 2025

Conversation

light-le
Copy link
Contributor

@light-le light-le commented Jan 16, 2025

solve #168. The algorithm was derived from this blog post

Copy link
Member

@johnnv1 johnnv1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add benchmarks to it as well?

@light-le
Copy link
Contributor Author

You mean in crates/kornia-imgproc/benches/bench_filters.rs ? Sure ok

@edgarriba
Copy link
Member

@johnnv1 any idea why python tests are failing (I believe it’s unrelated to this PR). Shouldn’t we be using the new just commands in https://github.com/kornia/kornia-rs/blob/main/.github/workflows/python_test.yml#L40

@johnnv1
Copy link
Member

johnnv1 commented Jan 19, 2025

@johnnv1 any idea why python tests are failing (I believe it’s unrelated to this PR). Shouldn’t we be using the new just commands in https://github.com/kornia/kornia-rs/blob/main/.github/workflows/python_test.yml#L40

yeah, seems unrelated, but should be working

@johnnv1 johnnv1 closed this Jan 21, 2025
@johnnv1 johnnv1 reopened this Jan 21, 2025
Copy link
Member

@edgarriba edgarriba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand this benchmark and report the numbers so that we know wether this method is really making what’s expected ?

https://github.com/kornia/kornia-rs/blob/main/crates/kornia-imgproc/benches/bench_filters.rs

I highly suggest once you have the benchmark setup that you play around with it and try to do micro optimisations like reusing as much as possible pre-computed variables as I suggested in the review to see how affects in the benchmarks.

@edgarriba edgarriba linked an issue Jan 26, 2025 that may be closed by this pull request
@light-le
Copy link
Contributor Author

light-le commented Feb 2, 2025

So regarding performance, box_blur_fast() filter is independent of the kernel size, but you have to apply the fast_horizontal_filter() 6 times over in 1 run. Therefore it would be slower than native Gaussian blur when the kernel size is small. Here's the performance with all of your suggested micro-optimization applied:

  • 256x224 image size, box_blur_fast() takes 2.98 ms.
    • Kernel size 3, native blur takes 1.5 ms
    • Kernel size 5, 1.9 ms
    • Size 7, 2.3 ms
    • 9, 3 ms
    • 11, 3.68 ms
    • 17, 5.69 ms
  • 512x448 image size, box_blur_fast() takes 12.25 ms
    • Kernel size 3, native blur takes 6 ms
    • Kernel size 5, 7.78 ms
    • Size 7, 9.5 ms
    • 9, 12.1 ms
    • 11, 14.84 ms
    • 17, 22.64 ms
  • 1024x896 image size, box_blur_fast() takes 50.3 ms
    • Kernel size 3, native blur takes 24 ms
    • Kernel size 5, 31.4 ms
    • Size 7, 38.54 ms
    • 9, 48.3 ms
    • 11, 59.3 ms
    • 17, 91 ms

Maybe there're some other optimizations I can do? I'm kinda afraid to introduce unsafe Rust to my code but that's something I could try.

@light-le light-le requested a review from edgarriba February 4, 2025 09:52
@edgarriba
Copy link
Member

@light-le the benchmarks looks good I believe. Maybe just a final check to exact match the blogpost results by using a 800x200 (r=[5,10]) -- eventhough i think for the numbers you report will be pretty much accurated.

Given the two implementations we have now and the reported timings, not sure we should name the one in this PR "fast" as we can match same performance with the other flavour ? I think also we show document as best as possible the limitations as you suggested with the kernels sizes.

In terms of improvements, the only think i can think of right now is to see if there's any precomputed thing we can do to avoid the branching for c==0 and then the matches below` maybe adding some padding before ? we can do that in another PR if that makes sense.

Other possibilities, could be investigate SIMD via wide; or more interestingly I have as pending task to prototype an API around CubeCL to support gpu implementations https://github.com/tracel-ai/cubecl

@edgarriba
Copy link
Member

I forgot to mention also an obvious attempt of rows parallelization via rayon

@light-le
Copy link
Contributor Author

light-le commented Feb 5, 2025

So when I bench using 800x200 image like the blog post. Box_blur_fast() takes 8.3 ms, gaussian_blur_native takes 5.7 ms for r =5 and 9.4 ms for r = 10.

I would leave optimizations that involve GPU or parallelization to another PR as these weren't implemented in the gaussian_blur_native(). But let me try to optimize the algorithm further.

@light-le
Copy link
Contributor Author

light-le commented Feb 5, 2025

I tried to add a bunch of unsafe rust codes, it helped speed up like about less than 10%

@edgarriba
Copy link
Member

got it -- then let's merge like this and we make optimizations later

@edgarriba edgarriba merged commit 0a357b5 into kornia:main Feb 6, 2025
11 checks passed
@light-le light-le deleted the box-blur-fast branch February 8, 2025 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement fast-box-blur
3 participants