Fix non-seekable stream reading. #1316

JimBobSquarePants · 2020-08-16T21:10:40Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

Fixes our non-seekable stream loading which was attempting to use the non-supported Length property.

ImageSharp/src/ImageSharp/Image.FromStream.cs

Line 734 in 8e2792e

    
           using MemoryStream memoryStream = configuration.MemoryAllocator.AllocateFixedCapacityMemoryStream(stream.Length);

// If CanSeek is false, Position, Seek, Length, and SetLength should throw.

https://source.dot.net/#System.Private.CoreLib/Stream.cs,51

The fix utilizes an internal ChunkedMemoryStream implementation to allow pooling of byte buffers via the use of non-contiguous chunks. This let's us allocate chunks from the buffer pool on demand upon read.

codecov · 2020-08-16T22:20:24Z

Codecov Report

Merging #1316 into master will decrease coverage by 0.01%.
The diff coverage is 80.86%.

@@            Coverage Diff             @@
##           master    #1316      +/-   ##
==========================================
- Coverage   82.76%   82.74%   -0.02%     
==========================================
  Files         689      689              
  Lines       30721    30933     +212     
  Branches     3473     3508      +35     
==========================================
+ Hits        25427    25597     +170     
- Misses       4587     4617      +30     
- Partials      707      719      +12

Flag	Coverage Δ
#unittests	`82.74% <80.86%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/ImageSharp/Memory/MemoryAllocatorExtensions.cs	`100.00% <ø> (ø)`
src/ImageSharp/Memory/MemoryOwnerExtensions.cs	`85.71% <ø> (ø)`
src/ImageSharp/IO/ChunkedMemoryStream.cs	`80.70% <80.70%> (ø)`
src/ImageSharp/Image.FromStream.cs	`83.94% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e2792e...582000d. Read the comment docs.

antonfirsov

Wow, this is fairly complex stuff, good job! I'll need to have another round of review after understanding the stuff I'm currently missing.

Even thought it's an urgent and critical bugfix, I believe it's worth to invest a bit more time to get higher confidence this actually works, even if the cost is a delay of few days. For this, I'm suggesting to do an aggressive round of semi-manual integration testing.

Also: do we have test cases that verify reading/writing correctness, when the operation touches/crosses the boundaries of chunks?

Etc ...

antonfirsov · 2020-08-19T00:53:47Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+        }
+
+        /// <inheritdoc/>
+        public override int Read(byte[] buffer, int offset, int count)


I would rather implement the Span<byte> overload, and call it from Read(byte[] buffer, int offset, int count). This would avoid an unnecessary copy done by the Stream base class to/from rented buffers.

Same for Write.

Note: Stream does the other way around with a copy because of compatibility reasons (the Span<byte> overload came later to the base class).

We actually never call the Span<byte> overload since everything is wrapped by the BufferedReadStream implementation.

To me it seems we do use it actually:

ImageSharp/src/ImageSharp/IO/BufferedReadStream.cs

Line 349 in 8e2792e

i = baseStream.Read(buffer.Slice(n, count - n));

Yeah, I realized that almost immediately after commenting. Was away from my computer so couldn't correct myself.

antonfirsov · 2020-08-19T00:58:44Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+        /// <summary>
+        /// The default length in bytes of each buffer chunk.
+        /// </summary>
+        public const int DefaultBufferLength = 4096;


This could probably go way higher, maybe even over 100K. Memory will be very fragmented with 4K. (1000 chunks for a 4MB image!)

Done. Went with 81920

antonfirsov · 2020-08-19T01:03:01Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+        private MemoryChunk writeChunk;
+
+        // Offset into chunk to write to
+        private int writeOffset;
+
+        // Current chunk to read from
+        private MemoryChunk readChunk;
+
+        // Offset into chunk to read from
+        private int readOffset;


I'm probably missing something, but why are we writing/reading to/from different chunks?
Shouldn't Stream.Position indicate where we stand in both cases?

For my sanity mostly. This is hard.

Ah ok, so it's a special stream implementation where writes go always to the end of the stream, right? Would add a comment about this.

Edit: not seen on GH for some reason, but it was a reply to a previous discussion around this line.

antonfirsov · 2020-08-19T01:16:10Z

tests/ImageSharp.Tests/Image/ImageTests.Load_FromStream_PassLocalConfiguration.cs

@@ -39,14 +39,25 @@ public void Configuration_Stream_Agnostic()
            [Fact]
            public void NonSeekableStream()
            {
-                var stream = new NoneSeekableStream(this.DataStream);
+                var stream = new NonSeekableStream(this.DataStream);


These tests are insufficient to validate ChunkedMemoryStream integration. They are only here to prove that different overloads of extension method calls propagate correctly to decoders. (this.DataStream contains only 16 bytes of fake data for the fake decoder!)

I would do two things to validate ChunkedMemoryStream:

Add a couple of integration tests feeding a Jpegs and PNG-s through NonSeekableStream, make sure to cover different sizes.

Create a disabled Xunit test for semi-manual local testing, that does the same test as 1. but iterates through all valid images under tests/Images/Input.

I've copied the existing Image.Identify and Image.IdentifyAsync to demonstrate accuracy. The unit tests also cover reading and writing across chunk boundaries.

antonfirsov · 2020-08-19T01:47:39Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+                while (chunk != null)
+                {
+                    MemoryChunk next = chunk.Next;
+                    if (next != null)
+                    {
+                        length += chunk.Length;
+                    }


If we cache the number of "fully written" chunks, the loop could be replaced by a multiplication.

I had a look but the benchmarks showed no need for the optimization so chose not to complicate things.

antonfirsov · 2020-08-19T01:48:45Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+
+                int pos = 0;
+                MemoryChunk chunk = this.memoryChunk;
+                while (chunk != this.readChunk)


Consider a trick similar I'm suggesting for Length. These properties are heavily used in our decoders as far as I can see.

JimBobSquarePants · 2020-08-19T14:37:56Z

@antonfirsov Thanks for the thorough review.

I've made a few changes based upon the feedback.

Added optimized read/write span methods and passed all reading/writing of buffers through those methods.
Increased the chunk size to 81920 bytes
Updated unit tests to demonstrate read/write over chunk boundaries
Added integration tests matching existing seekable tests.
Added benchmarks to demonstrate no lack of performance during decode.

antonfirsov

I pushed the heavyweight integration test I suggested, and it fails for 65 out of 308 images.
We need to Skip the new theory, but add some of it's cases to our regular test suite. The lightweight Identify test is using the (simple) BMP decoder and doesn't really stress the new complex code we are about to add.

Regarding benchmarks:
I don't think they deliver good indicators in this particular case, since they are only showing a numbers about primitive operations being executed on small input streams.

We need to know: How many times do our decoders touch Length and Position for a typical huge image (I've seen couple of usages in PNG). If the number is high enough (Eg. bc. relevant methods are called in a loop), we need to proove the perf is not affected when decoding a large image.

JimBobSquarePants · 2020-08-19T15:53:24Z

Odd that’s failing, will investigate. Re length and position, the buffered stream only calls that in the underlying stream when the buffer is exhausted so I wouldn’t consider that a hot path.

JimBobSquarePants · 2020-08-19T16:19:17Z

I've just disabled the test on 32bit. They don't take much time so I think we should just keep them for now.

antonfirsov

Just a few wishes, otherwise looks good now!

antonfirsov · 2020-08-20T12:13:44Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+        private MemoryChunk writeChunk;
+
+        // Offset into chunk to write to
+        private int writeOffset;
+
+        // Current chunk to read from
+        private MemoryChunk readChunk;
+
+        // Offset into chunk to read from
+        private int readOffset;


Ah ok, so it's a special stream implementation where writes go always to the end of the stream, right? Would add a comment about this.

Edit: not seen on GH for some reason, but it was a reply to a previous discussion around this line.

antonfirsov · 2020-08-20T12:18:39Z

tests/ImageSharp.Tests/IO/ChunkedMemoryStreamTests.cs

+            Image<Rgba32> expected;
+            try
+            {
+                expected = Image.Load<Rgba32>(testFileFullPath);


I know it's my code, but if we leave all tests, on this line I would really like to replace Image.Load with TestFile stuff now to make the tests at least a bit faster by utilizing it's caching mechanism. Would do it, but I'm on a train now, may try later, if you also lack time.

I've updated it to use the provider.

Fix non-seekable stream reading.

9207bab

JimBobSquarePants added the bug label Aug 16, 2020

JimBobSquarePants requested a review from a team August 16, 2020 21:10

Update ChunkedMemoryStream.cs

a971304

JimBobSquarePants added 2 commits August 17, 2020 10:25

More tests and remove old stream

822637b

Add WriteByteTests

9097a0a

antonfirsov reviewed Aug 19, 2020

View reviewed changes

JimBobSquarePants added 4 commits August 19, 2020 13:40

Add optimized Read(Span) API

399cfc2

Optimize Write(Span)

cd32a60

Increase chunk size, add benchmarks and write span tests

24780c5

Add lightweight integration tests.

5aa9158

JimBobSquarePants requested review from antonfirsov and a team August 19, 2020 14:42

heavyweight integration test

6f40e43

antonfirsov requested changes Aug 19, 2020

View reviewed changes

Fix read count.

a4f63d7

JimBobSquarePants requested a review from antonfirsov August 19, 2020 16:38

JimBobSquarePants mentioned this pull request Aug 19, 2020

Image.WrapMemory<TPixel> APIs wrapping Memory<byte> #1314

Merged

4 tasks

antonfirsov approved these changes Aug 20, 2020

View reviewed changes

JimBobSquarePants linked an issue Aug 20, 2020 that may be closed by this pull request

Unable to load image from DeflateStream. #1323

Closed

4 tasks

JimBobSquarePants mentioned this pull request Aug 20, 2020

Unable to load image from DeflateStream. #1323

Closed

4 tasks

Use TestProvider to load images

582000d

JimBobSquarePants merged commit d95a996 into master Aug 21, 2020

JimBobSquarePants deleted the js/fix-non-seekable-load branch August 21, 2020 14:38

antonfirsov mentioned this pull request Nov 24, 2022

Improve Decoder/Encoder symmetry #2276

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix non-seekable stream reading. #1316

Fix non-seekable stream reading. #1316

JimBobSquarePants commented Aug 16, 2020 •

edited

Loading

codecov bot commented Aug 16, 2020 •

edited

Loading

antonfirsov left a comment •

edited

Loading

antonfirsov Aug 19, 2020 •

edited

Loading

JimBobSquarePants Aug 19, 2020

antonfirsov Aug 19, 2020

JimBobSquarePants Aug 19, 2020

antonfirsov Aug 19, 2020 •

edited

Loading

JimBobSquarePants Aug 19, 2020

antonfirsov Aug 19, 2020 •

edited

Loading

JimBobSquarePants Aug 19, 2020

antonfirsov Aug 20, 2020 •

edited

Loading

antonfirsov Aug 19, 2020 •

edited

Loading

JimBobSquarePants Aug 19, 2020

antonfirsov Aug 19, 2020 •

edited

Loading

JimBobSquarePants Aug 19, 2020

antonfirsov Aug 19, 2020

JimBobSquarePants Aug 19, 2020

JimBobSquarePants commented Aug 19, 2020

antonfirsov left a comment •

edited

Loading

JimBobSquarePants commented Aug 19, 2020

JimBobSquarePants commented Aug 19, 2020 •

edited

Loading

antonfirsov left a comment

antonfirsov Aug 20, 2020 •

edited

Loading

antonfirsov Aug 20, 2020 •

edited

Loading

JimBobSquarePants Aug 21, 2020

Fix non-seekable stream reading. #1316

Fix non-seekable stream reading. #1316

Conversation

JimBobSquarePants commented Aug 16, 2020 • edited Loading

Prerequisites

Description

codecov bot commented Aug 16, 2020 • edited Loading

Codecov Report

antonfirsov left a comment • edited Loading

Choose a reason for hiding this comment

antonfirsov Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Aug 20, 2020 • edited Loading

Choose a reason for hiding this comment

antonfirsov Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JimBobSquarePants commented Aug 19, 2020

antonfirsov left a comment • edited Loading

Choose a reason for hiding this comment

JimBobSquarePants commented Aug 19, 2020

JimBobSquarePants commented Aug 19, 2020 • edited Loading

antonfirsov left a comment

Choose a reason for hiding this comment

antonfirsov Aug 20, 2020 • edited Loading

Choose a reason for hiding this comment

antonfirsov Aug 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JimBobSquarePants commented Aug 16, 2020 •

edited

Loading

codecov bot commented Aug 16, 2020 •

edited

Loading

antonfirsov left a comment •

edited

Loading

antonfirsov Aug 19, 2020 •

edited

Loading

antonfirsov Aug 19, 2020 •

edited

Loading

antonfirsov Aug 19, 2020 •

edited

Loading

antonfirsov Aug 20, 2020 •

edited

Loading

antonfirsov Aug 19, 2020 •

edited

Loading

antonfirsov Aug 19, 2020 •

edited

Loading

antonfirsov left a comment •

edited

Loading

JimBobSquarePants commented Aug 19, 2020 •

edited

Loading

antonfirsov Aug 20, 2020 •

edited

Loading

antonfirsov Aug 20, 2020 •

edited

Loading