Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up and optimize byte<->float and Rgba32 <-> Vector4 conversion #742

Merged
merged 24 commits into from
Oct 21, 2018

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Oct 20, 2018

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

  • Uniformized Span<byte> -> Span<float> (and opposite) conversion methods in SimdUtils which would be useful for Epic: ResizeProcessor performance improvements (Memory & CPU) #733.
  • Changed Rgba32.PixelOperations to consume these uniformized converters in bulk conversions to/from Vector4
  • New implementations for the conversions based on dotnet/corefx#15957. These are only accelerated by the newest RyuJIT deployed with .NET Core 2.1 and .NET Framework 4.7.2. Currently the latter is not being targeted directly by ImageSharp, so the new implementations are only used on .NET Core 2.1
    • Span<byte> -> Span<float> (thus Span<Rgba32> -> Span<Vector4>) is 3x faster than the current implementation on main
    • The opposite direction is ~2x faster

Benchmark results

Detailed benchmark results can be found in comments in benchmark code (ToVector4_Rgba32, PackFromVector4_Rgba32).

Here are the interesting bits:

  1. ToVector4_Rgba32
      FallbackIntrinsics128 |    Core |  2048 | 5,017.89 ns | 4,021.533 ns | 227.2241 ns |   1.24 |     0.05 |      - |       0 B |
         BasicIntrinsics256 |    Core |  2048 | 4,046.51 ns | 1,150.390 ns |  64.9992 ns |   1.00 |     0.00 |      - |       0 B |
         ExtendedIntrinsics |    Core |  2048 | 1,130.59 ns |   832.588 ns |  47.0427 ns |!! 0.28 |     0.01 |      - |       0 B |
       PixelOperations_Base |    Core |  2048 | 6,752.68 ns |   272.820 ns |  15.4148 ns |   1.67 |     0.02 |      - |      24 B |
PixelOperations_Specialized |    Core |  2048 | 1,126.13 ns |    79.192 ns |   4.4745 ns |!! 0.28 |     0.00 |      - |       0 B |
  1. PackFromVector4_Rgba32
      FallbackIntrinsics128 |    Core |  2048 |  6,644.65 ns | 2,677.090 ns | 151.2605 ns |   1.69 |     0.05 |      - |       0 B |
         BasicIntrinsics256 |    Core |  2048 |  3,923.70 ns | 1,971.760 ns | 111.4081 ns |   1.00 |     0.00 |      - |       0 B |
          ExtendedIntrinsic |    Core |  2048 |  2,092.32 ns |   375.657 ns |  21.2253 ns |!! 0.53 |     0.01 |      - |       0 B |
       PixelOperations_Base |    Core |  2048 | 16,875.73 ns | 1,271.957 ns |  71.8679 ns |   4.30 |     0.10 |      - |      24 B |
PixelOperations_Specialized |    Core |  2048 |  2,129.92 ns |   262.888 ns |  14.8537 ns |!! 0.54 |     0.01 |      - |       0 B |

@codecov
Copy link

codecov bot commented Oct 21, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@7e76506). Click here to learn what that means.
The diff coverage is 95.67%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #742   +/-   ##
=========================================
  Coverage          ?   89.33%           
=========================================
  Files             ?      972           
  Lines             ?    42891           
  Branches          ?     3038           
=========================================
  Hits              ?    38318           
  Misses            ?     3889           
  Partials          ?      684
Impacted Files Coverage Δ
...olorConverters/JpegColorConverter.FromYCbCrSimd.cs 84.9% <ø> (ø)
...ents/Decoder/ColorConverters/JpegColorConverter.cs 89.85% <ø> (ø)
...Converters/JpegColorConverter.FromYCbCrSimdAvx2.cs 91.66% <ø> (ø)
src/ImageSharp/Common/Tuples/Vector4Pair.cs 62.5% <ø> (ø)
....Tests/TestUtilities/Tests/TestEnvironmentTests.cs 63.63% <ø> (ø)
...mageSharp.Tests/TestUtilities/TestDataGenerator.cs 100% <100%> (ø)
.../ImageSharp/PixelFormats/Rgba32.PixelOperations.cs 100% <100%> (ø)
...ImageSharp/PixelFormats/PixelOperations{TPixel}.cs 100% <100%> (ø)
src/ImageSharp/Common/Tuples/Octet.cs 90% <90%> (ø)
...arp/Common/Helpers/SimdUtils.BasicIntrinsics256.cs 92.2% <92.2%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e76506...bf7c933. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 21, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@7e76506). Click here to learn what that means.
The diff coverage is 93.32%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #742   +/-   ##
=========================================
  Coverage          ?   89.31%           
=========================================
  Files             ?      973           
  Lines             ?    42979           
  Branches          ?     3047           
=========================================
  Hits              ?    38386           
  Misses            ?     3911           
  Partials          ?      682
Impacted Files Coverage Δ
...ents/Decoder/ColorConverters/JpegColorConverter.cs 89.85% <ø> (ø)
...olorConverters/JpegColorConverter.FromYCbCrSimd.cs 84.9% <ø> (ø)
...Converters/JpegColorConverter.FromYCbCrSimdAvx2.cs 91.66% <ø> (ø)
src/ImageSharp/Common/Tuples/Vector4Pair.cs 62.5% <ø> (ø)
....Tests/TestUtilities/Tests/TestEnvironmentTests.cs 63.63% <ø> (ø)
tests/ImageSharp.Tests/Helpers/ImageMathsTests.cs 100% <100%> (ø)
.../Common/Helpers/SimdUtils.FallbackIntrinsics128.cs 100% <100%> (ø)
src/ImageSharp/Common/Helpers/ImageMaths.cs 86.66% <100%> (ø)
...mageSharp.Tests/TestUtilities/TestDataGenerator.cs 100% <100%> (ø)
.../ImageSharp/PixelFormats/Rgba32.PixelOperations.cs 100% <100%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e76506...90c7153. Read the comment docs.

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll freely admit. There's some stuff here I don't fully understand but what I do seems sensible. Just a few questions.

}

[MethodImpl(InliningOptions.ShortMethod)]
public static float Clamp(float x, float min, float max) => Math.Min(max, Math.Max(min, x));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might wanna benchmark this. The IComparableExtensions version should be a good bit faster.

I do, however want to ditch the extension method for more clear MathUtils.Clamp*** methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do compare!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the duplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one in ComparableExtensions is faster, I'm removing this!

/// http://lolengine.net/blog/2011/3/20/understanding-fast-float-integer-conversions
/// http://stackoverflow.com/a/536278
/// </summary>
internal static void BulkConvertByteToNormalizedFloat(ReadOnlySpan<byte> source, Span<float> dest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this perhaps be private so it cannot be called without sanitation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are all unit tested separately (so coverage is independent from current HW configuration), so we need them as internal.

I can improve the input checking a bit however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right... Makes sense then. Carry on!

{
public static bool IsAvailable { get; } =
#if NETCOREAPP2_1
// TODO: Also available in .NET 4.7.2, we need to add a build target!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one if you need one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to do this in a separate PR, or open an up-for-grabs issue. @iamcarbon is the champion of this stuff! 😄

s *= maxBytes;
s += half;

// I'm not sure if Vector4.Clamp() is properly implemented with intrinsics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do recall reading somewhere it wasn't.


public override string ToString()
{
return $"[{this.V0},{this.V1},{this.V2},{this.V3},{this.V4},{this.V5},{this.V6},{this.V7}]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm favoring the TypeName(field, field, field) format now for ToString constituency.

@@ -6,7 +6,7 @@
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

using SixLabors.ImageSharp.Common.Tuples;
using SixLabors.ImageSharp.Tuples;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these come under our Primitives namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a better place!

@@ -29,10 +29,12 @@ public partial class PixelOperations<TPixel>
/// <param name="count">The number of pixels to convert.</param>
internal virtual void PackFromVector4(ReadOnlySpan<Vector4> sourceVectors, Span<TPixel> destinationColors, int count)
{
GuardSpans(sourceVectors, nameof(sourceVectors), destinationColors, nameof(destinationColors), count);
ReadOnlySpan<Vector4> sourceVectors1 = sourceVectors;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come these are reassigned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Result of undoing a refactor with R#. Will fix it.

return ImageMaths.ModuloP2(this.value, this.m);
}

// RESULTS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to make sure I add the results like this when benchmarking. Really good idea.

}

[MethodImpl(InliningOptions.ShortMethod)]
public static float Clamp(float x, float min, float max) => Math.Min(max, Math.Max(min, x));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the duplication?

/// <inheritdoc />
internal override void ToVector4(ReadOnlySpan<Rgba32> sourceColors, Span<Vector4> destinationVectors, int count)
{
Guard.MustBeSizedAtLeast(sourceColors, count, nameof(sourceColors));
Guard.MustBeSizedAtLeast(destinationVectors, count, nameof(destinationVectors));

if (count < 256 || !Vector.IsHardwareAccelerated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we no longer need the count check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an optimization for small buffers, but the new logic made it obsolete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay 👍 Just wanted to make sure that you not missed this.

@@ -40,7 +44,7 @@ public void Cleanup()
this.source.Dispose();
}

[Benchmark(Baseline = true)]
//[Benchmark]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be commented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark code serves as an information, but the execution is unnecessary, unless someone wants to evaluate that specific method in future investigations.

I wish there was some better way to Skip benchmarks without completely dropping their code.

@@ -38,33 +50,171 @@ public void Cleanup()
this.destination.Dispose();
}

[Benchmark(Baseline = true)]
//[Benchmark]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be commented?

@antonfirsov
Copy link
Member Author

@JimBobSquarePants @dlemstra all findings were addressed. Gonna merge this as soon as the compilation is finished so we can go on with #729.

@antonfirsov antonfirsov merged commit 6f5ebbb into master Oct 21, 2018
@JimBobSquarePants
Copy link
Member

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants