-
-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed Up Jpeg Encoder Color Conversion #1476
Comments
DirectX Math is MIT licensed and already provides SIMD accelerated algorithms (for x86/x64 and ARM64) for many standard color conversions: https://docs.microsoft.com/en-us/windows/win32/api/directxmath/nf-directxmath-xmcolorrgbtoyuv https://github.com/microsoft/DirectXMath/blob/master/Inc/DirectXMathMisc.inl#L1728-L1738 It's documented to use There is likewise https://docs.microsoft.com/en-us/windows/win32/api/directxmath/nf-directxmath-xmcolorrgbtoyuv_hd which uses |
Thanks @tannergooding I'll see how well the code there fits with our existing architecture. |
Note that the referenced netvips benchmark is quite atypical for Resize. Users usually downscale much more than 90%, so I wouldn't worry that much for the encoder being a bottleneck. We should profile this though, I wonder how do the bottlenecks distribute exactly. |
@Sergio0694 was doing some work with 4K images the other day and the benchmarks he showed me indicated that the encoder was a major bottleneck. Not atypical but also not that uncommon. I’ll try and dig out the screenshot |
Ah yeah saving 4K JPEG must be very slow indeed. (Hope noticeably better with #1508). What I mean is that we got away with a slow encoder this long because typical thumbnail maker code is usually saving a very small output image, so it's not that hot for web content management probably. (Doesn't mean it's not painful in other use-cases.) |
Yeah I was very surprised to see just how much slower ImageSharp was at JPEG encoding/decoding 😥 I was expecting it to be somewhat on par, but especially the encoding part is really a lot, a lot slower. In my case basically just saving the image takes more than the entirety of copying to GPU, processing it and copying it back. But like, it takes 4x times as all those steps combined, and I haven't even optimized them that much either. I was kinda tempted to switch my samples to System.Drawing, though in the end I didn't because, well, I love you guys, and also the API surface of System.Drawing is ugly 😄 Point is, any speed improvements in this area would be a super welcome improvement, especially if you're all concerned about people running comparative benchmarks between ImageSharp and other common image processing libraries. On this point, will make some tests on a few improvements I've been meaning to add to the resize kernel using FMA instructions too 🚀 |
I've attached a speedscope dump from PerfView as asked for in #1517 (comment).. The trace is from a BenchmarkDotNet benchmark of a 4K JPEG export after the optimization in #1517. Unzip it, open it in https://www.speedscope.app/, select "Left heavy" (top menu), scroll all the way down.. |
Not a pro with this tool, but if I'm reading it right, RowOctet constructor, and Emit are the new bottlenecks. @tkp1n can you confirm? Here only the ImageSharp/src/ImageSharp/Formats/Jpeg/JpegEncoderCore.cs Lines 1011 to 1012 in 0e0dc2a
|
Exactly, yes. |
@JimBobSquarePants I think we can close this in favor of a general JpegEncoder perf tracking issue. |
Ok. Let’s migrate all the relevant info. |
Fixed via #2120 |
Some analysis of the performance of the encoder based upon a breakdown of this benchmark indicates that encoding a large jpeg takes 80% of the entire processing time.
https://github.com/kleisauke/net-vips/tree/master/tests/NetVips.Benchmarks
This is due to the lack of hardware acceleration in our color conversion approach.
The current Jpeg encoder utilizes predefined tables to convert a span of
Rgb24
pixels into separate Y Cb CrBlock8x8F
planes.ImageSharp/src/ImageSharp/Formats/Jpeg/Components/Encoder/YCbCrForwardConverter{TPixel}.cs
Lines 58 to 83 in f1a0fb6
While this is faster than naïve per-pixel floating point calculation it can be heavily optimized.
Short Term Goal
Add AVX2? acceleration directly to the converter to optimize conversion for .NET Core 3.1+. This should be a few hours work for someone with SIMD knowledge.
Long Term Goal
Establish an architecture similar to the Jpeg decoder ColorConverters allowing incremental addition accelerated converters for all platforms and color spaces.
The text was updated successfully, but these errors were encountered: