Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve Gif color palettes and deduplicate frame pixels. #2455

Merged
merged 33 commits into from
Sep 18, 2023

Conversation

JimBobSquarePants
Copy link
Member

@JimBobSquarePants JimBobSquarePants commented May 15, 2023

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR does a few things.

  1. Reverts an issue introduced in Fix various Gif Decoder/Encoder behaviors.  #2289 where a naive attempt to reduce encoded output led to a loss of image data when gifs have been previously optimized.
  2. Introduces a new SIMD accelerated (Avx2, Sse2, Arm64) algorithm that deduplifies the frame index buffers following quantization to allow much better encoder compression.
  3. Introduces a necessary breaking change to the gif metadata and encoder (Next release will be v3.1) that allows taking advantage of these optimizations. Without these changes some gifs could encode at nearly 7x the input size using the default settings.

Fixes #635
Fixes #2450
Fixes #2198
Touches #2404

The new deduplifier does a comparison of the quantized pixel indexes using the previous and current frame. If duplicates are detected in the current frame, then the indices are replaced with the indez of the background or transparent color. Given the increased repeats this enables better compression.

This functionality is limited to gifs encoded using global color palettes as we would require extra work to detect duplicates in local color tables and produce a map of index offsets which could be cumbersome (not to say this might come in a following PR in the future) .

Screenshot 2023-05-20 000820

@JimBobSquarePants JimBobSquarePants added bug formats:gif breaking Signifies a binary breaking change. arch:x64 labels May 15, 2023
@antonfirsov
Copy link
Member

An API diff would be nice (for both the PR and the release notes) to make it obvious what specific API-s are we breaking.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented May 16, 2023

An API diff would be nice (for both the PR and the release notes) to make it obvious what specific API-s are we breaking.

Yep agreed. I'll get on that.

Update. Here's the API diff.

I've removed the XXColorTableLength properties from each metadata type as they're not trackable with new property values..
the IQuantizer property of QuantizingImageEncoder is now nullable. All encoders have been patched to ensure a default quantizer.

/// <summary>
/// Provides Gif specific metadata information for the image.
/// </summary>
public class GifMetadata : IDeepCloneable
{
-   /// <summary>
-   /// Gets or sets the length of the global color table if present.
-   /// </summary>
-   public int GlobalColorTableLength { get; set; }

+  /// <summary>
+  /// Gets or sets the global color table, if any.
+  /// </summary>
+  public ReadOnlyMemory<Color>? GlobalColorTable { get; set; }

+  /// <summary>
+  /// Gets or sets the index at the <see cref="GlobalColorTable"/> for the background color.
+  /// The background color is the color used for those pixels on the screen that are not covered by an image.
+  /// </summary>
+    public byte BackgroundColor { get; set; }
}
/// <summary>
/// Provides Gif specific metadata information for the image frame.
/// </summary>
public class GifFrameMetadata : IDeepCloneable
{
-    /// <summary>
-    /// Gets or sets the length of the color table.
-    /// If not 0, then this field indicates the maximum number of colors to use when quantizing the
-    /// image frame.
-    /// </summary>
-    public int ColorTableLength { get; set; }

+   /// <summary>
+   /// Gets or sets the local color table, if any.
+   /// </summary>
+   public ReadOnlyMemory<Color>? LocalColorTable { get; set; }

+   /// <summary>
+   /// Gets or sets a value indicating whether the frame has transparency
+   /// </summary>
+   public bool HasTransparency { get; set; }

+   /// <summary>
+   /// Gets or sets the transparency index.
+   /// When <see cref="HasTransparency"/> is set to <see langword="true"/> this value indicates the index within
+   /// the color palette at which the transparent color is located.
+   /// </summary>
+   public byte TransparencyIndex { get; set; }
}
/// <summary>
/// Acts as a base class for all image encoders that allow color palette generation via quantization.
/// </summary>
public abstract class QuantizingImageEncoder : ImageEncoder
{
-   /// <summary>
-   /// Gets the quantizer used to generate the color palette.
-   /// </summary>
-   public IQuantizer Quantizer { get; init; } = KnownQuantizers.Octree;

+   /// <summary>
+   /// Gets the quantizer used to generate the color palette.
+   /// </summary>
+   public IQuantizer? Quantizer { get; init; }
}

@JimBobSquarePants JimBobSquarePants marked this pull request as ready for review May 19, 2023 14:05
@JimBobSquarePants JimBobSquarePants changed the title WIP. Preserve color palettes and deduplicate frame pixels. Preserve color palettes and deduplicate frame pixels. May 19, 2023
@JimBobSquarePants
Copy link
Member Author

@SixLabors/core Would it be possible to get some eyes on this please?

@gfoidl
Copy link
Contributor

gfoidl commented Jul 17, 2023

@JimBobSquarePants FYI regarding

I've put that on my todo list, and hope that next week I can give this a stab.

Got a business work-item which I put above this one in my todo 😉
Hoping that I can look into this at the end of the week.

@JimBobSquarePants
Copy link
Member Author

@JimBobSquarePants FYI regarding

I've put that on my todo list, and hope that next week I can give this a stab.

Go a business work-item which I put above this one in my todo 😉 Hoping that I can look into this at the end of the week.

Thanks.. I think I've simplified it enough to maybe make it possible.

@JimBobSquarePants
Copy link
Member Author

253 of the changed files here are reference images. If I could get someone to have a look at the actual code, I'd really appreciate it.

@gfoidl
Copy link
Contributor

gfoidl commented Jul 27, 2023

My take on vectorizing the TrimTransparentPixels is in #2500 (to be merged into this PR / branch).

gfoidl and others added 4 commits July 27, 2023 22:26
Hm, I remembered that movemask isn't the fastest, and ptest (TestZ in .NET-terms) is faster but current benchmarks didn't prove this, also Intel's instruction table didn't show any benefit in terms of latency or throughput.
Thus simplified that check.
Vectorize TrimTransparentPixels in GifEncoderCore
@JimBobSquarePants
Copy link
Member Author

@SixLabors/core Can I please get some eyes on this? I'd like to keep moving as I have a huge backlog of work to do.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of recommendations and a few general comments.

@@ -50,7 +50,7 @@ public static void Decode(Span<byte> scanline, Span<byte> previousScanline, int
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the PR, but I wonder if our heavy usage of AggressiveInlining has any benefits, especially now that we target modern runtimes only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need to start reviewing this. Need good tooling to inspect the codegen though.

Comment on lines +655 to +656
Vector128<short> signedMask = AdvSimd.ShiftRightArithmetic(mask.AsInt16(), 7);
return AdvSimd.BitwiseSelect(signedMask, right.AsInt16(), left.AsInt16()).AsByte();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, nothing to do with the PR but it's a bit messy now that it's unclear for a caller which instruction set is supported in various methods on SimdUtils and Numerics. Originally public methods of SimdUtils were supposed to work regardless of the instructions and SimdUtils.HwIntrinsics used to contain private implementation details for the System.Runtime.Intrinsics branch. Would be nice to figure out new general rules later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has become a bit of a dumping ground over the years. My plan for v4 is a mass simplification of intrinsics utilizing things like Vector128<T> to allow us to implement better cross chipset approaches. We can do a tidy up at the same time.

/// </summary>
public int GlobalColorTableLength { get; set; }
public byte BackgroundColor { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be BackgroundColorIndex a better name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much better, will change.

{
this.WriteGraphicalControlExtension(metadata, transparencyIndex, stream);

Buffer2DRegion<byte> region = ((IPixelSource)quantized).PixelBuffer.GetRegion();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we need the region here? I don't see us slicing down the full buffer.

Copy link
Member Author

@JimBobSquarePants JimBobSquarePants Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteImageData requires a region as we slice down the buffer in subsequent frames. I'll rewrite it though, so it accepts a buffer and a rectangle.

@@ -332,7 +344,7 @@ private void ReadApplicationExtension(BufferedReadStream stream)
if (subBlockSize == GifConstants.NetscapeLoopingSubBlockSize)
{
stream.Read(this.buffer.Span, 0, GifConstants.NetscapeLoopingSubBlockSize);
this.gifMetadata!.RepeatCount = GifNetscapeLoopingApplicationExtension.Parse(this.buffer.Span.Slice(1)).RepeatCount;
this.gifMetadata!.RepeatCount = GifNetscapeLoopingApplicationExtension.Parse(this.buffer.Span[1..]).RepeatCount;
Copy link
Member

@antonfirsov antonfirsov Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a critical path to care about if Slice is faster. Simpler looking code is better.

Comment on lines 713 to 716
ref Rgb24 localBase = ref MemoryMarshal.GetReference(MemoryMarshal.Cast<byte, Rgb24>(this.currentLocalColorTable!.GetSpan()[..this.currentLocalColorTableSize]));
for (int i = 0; i < colorTable.Length; i++)
{
colorTable[i] = new Color(Unsafe.Add(ref localBase, (uint)i));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not a critical path, I wouldn't go unsafe here and just use span indexers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to avoid the secondary indexer on the raw Rgb24 span.

Copy link
Member

@antonfirsov antonfirsov Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is there a measurable perf gain from avoiding the indexer here? This method is only called once per frame on a relatively small (<256) set of data. On the other hand, each usage of Unsafe reduces readability and may introduce a serious bug in case we accidentally create a buffer overflow as we refactor the code in the future. IMO each usage of Unsafe should be always justified (=hottest hot path) instead of being the default way of indexing things.

/// Gets or sets the length of the global color table if present.
/// Gets or sets the global color table, if any.
/// </summary>
public ReadOnlyMemory<Color>? GlobalColorTable { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is color resolution different from 24 bits ever expected according to the gif standard? If not, wouldn't it be better to go with ReadOnlyMemory<Rgb24> and avoiding the conversions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't and I did consider using Rgb24 originally but thought that Color gave us greater portability on the API surface. For example, someone could now easily create a pallet quantizer from this color table.

Copy link
Member

@antonfirsov antonfirsov Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think such use-cases will be common it might be ok, but then I would extend the xml doc describing that GlobalColorTable is 24bit.

For me it still feels better to expose the raw gif data as-is (communicating the limitations on the API surface), and provide helper API-s to make work with PaletteQuantizer easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment is probably best. When I started to convert, I realized while it meant the decoder was simpler, encoding then required an interim allocation of a Color[] to supply to the palette quantizer which got messy.

Comment on lines 121 to 139
if (this.transparentIndex >= 0 && rgba == default)
{
// We have explicit instructions. No need to search.
index = this.transparentIndex;
this.cache.Add(rgba, (byte)index);

if (index >= 0 && index < this.Palette.Length)
{
match = Unsafe.Add(ref paletteRef, (uint)index);
}
else
{
Unsafe.SkipInit(out TPixel pixel);
pixel.FromScaledVector4(Vector4.Zero);
match = pixel;
}

return index;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now penalizing all quantizers and decoders while only gif is using transparentIndex. Can't we keep the transparency detection logic in GifDecoderCore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out a way to do it. Given that the slow code is actually the loop through the palette I thought this addition check outside the loop would be ok.

I actually think Bmp will end up using it eventually when we finally expose a decoded palette there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Slow" actually means we run it every time a new value is added to ColorDistanceCache which should be still quite common for an image with a wide original palette. I also don't see an easy way to optimize this, we should probably introduce multiple variants of EuclideanPixelMap<TPixel> and PalettQuantizer<TPixel> in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this and compared to the actual iteration the >=0 check should be barely detectable. The comparison operator won't even kick in.

Copy link
Member

@antonfirsov antonfirsov Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison is indeed cheap, the problem is not with that. We have a lot of code added to the method which may lead to more frequent L1 instruction cache misses. Branches can also hurt CPU pipelining.

Since GetQuantizedColor is being called per-pixel, the method should be as slim as possible, ideally containing no branching except the pixelmap lookup. We should be able to achieve this (and greatly reduce the memory footprint for pixel types without an alpha!) by separating different OctreeQuantizer<T> setups to separate classes.

If we could look into this together with optimization opportunities in ErrorDitherer, we could achieve significant perf improvements in the quantization space, and consequentially in lossless encoders. Not a small amount of work though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah true. I'll not touch this until you have a chance to make changes. We can look at greater optimization opportunities for v4

@JimBobSquarePants JimBobSquarePants mentioned this pull request Aug 22, 2023
4 tasks
index = this.transparentIndex;
this.cache.Add(rgba, (byte)index);

if (index >= 0 && index < this.Palette.Length)
Copy link
Member

@antonfirsov antonfirsov Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that in this branch index == this.transparentIndex >= 0, the index >= 0 condition seems to be redundant.

Regarding the index < this.Palette.Length condition: shouldn't we make this an invariant rule of PaletteQuantizer & EuclideanPixelMap instead, and Guard it in the constructor and SetTransparentIndex?

This would simplify this code a lot, making it also faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. If you have time, could you push to the branch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it later this week.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that's much better! Not sure what I was thinking when I wrote mine! 🤣

@JimBobSquarePants
Copy link
Member Author

@antonfirsov I think I've covered everything now.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:arm64 breaking Signifies a binary breaking change. bug formats:gif
Projects
None yet
3 participants