Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non-seekable stream reading. #1316

Merged
merged 11 commits into from
Aug 21, 2020
Merged

Conversation

JimBobSquarePants
Copy link
Member

@JimBobSquarePants JimBobSquarePants commented Aug 16, 2020

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Fixes our non-seekable stream loading which was attempting to use the non-supported Length property.

using MemoryStream memoryStream = configuration.MemoryAllocator.AllocateFixedCapacityMemoryStream(stream.Length);

// If CanSeek is false, Position, Seek, Length, and SetLength should throw.

https://source.dot.net/#System.Private.CoreLib/Stream.cs,51

The fix utilizes an internal ChunkedMemoryStream implementation to allow pooling of byte buffers via the use of non-contiguous chunks. This let's us allocate chunks from the buffer pool on demand upon read.

@JimBobSquarePants JimBobSquarePants requested a review from a team August 16, 2020 21:10
@codecov
Copy link

codecov bot commented Aug 16, 2020

Codecov Report

Merging #1316 into master will decrease coverage by 0.01%.
The diff coverage is 80.86%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1316      +/-   ##
==========================================
- Coverage   82.76%   82.74%   -0.02%     
==========================================
  Files         689      689              
  Lines       30721    30933     +212     
  Branches     3473     3508      +35     
==========================================
+ Hits        25427    25597     +170     
- Misses       4587     4617      +30     
- Partials      707      719      +12     
Flag Coverage Δ
#unittests 82.74% <80.86%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Memory/MemoryAllocatorExtensions.cs 100.00% <ø> (ø)
src/ImageSharp/Memory/MemoryOwnerExtensions.cs 85.71% <ø> (ø)
src/ImageSharp/IO/ChunkedMemoryStream.cs 80.70% <80.70%> (ø)
src/ImageSharp/Image.FromStream.cs 83.94% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e2792e...582000d. Read the comment docs.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is fairly complex stuff, good job! I'll need to have another round of review after understanding the stuff I'm currently missing.

Even thought it's an urgent and critical bugfix, I believe it's worth to invest a bit more time to get higher confidence this actually works, even if the cost is a delay of few days. For this, I'm suggesting to do an aggressive round of semi-manual integration testing.

Also: do we have test cases that verify reading/writing correctness, when the operation touches/crosses the boundaries of chunks?

image

Etc ...

}

/// <inheritdoc/>
public override int Read(byte[] buffer, int offset, int count)
Copy link
Member

@antonfirsov antonfirsov Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather implement the Span<byte> overload, and call it from Read(byte[] buffer, int offset, int count). This would avoid an unnecessary copy done by the Stream base class to/from rented buffers.

Same for Write.

Note: Stream does the other way around with a copy because of compatibility reasons (the Span<byte> overload came later to the base class).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually never call the Span<byte> overload since everything is wrapped by the BufferedReadStream implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it seems we do use it actually:

i = baseStream.Read(buffer.Slice(n, count - n));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I realized that almost immediately after commenting. Was away from my computer so couldn't correct myself.

/// <summary>
/// The default length in bytes of each buffer chunk.
/// </summary>
public const int DefaultBufferLength = 4096;
Copy link
Member

@antonfirsov antonfirsov Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably go way higher, maybe even over 100K. Memory will be very fragmented with 4K. (1000 chunks for a 4MB image!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Went with 81920

Comment on lines +36 to +45
private MemoryChunk writeChunk;

// Offset into chunk to write to
private int writeOffset;

// Current chunk to read from
private MemoryChunk readChunk;

// Offset into chunk to read from
private int readOffset;
Copy link
Member

@antonfirsov antonfirsov Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but why are we writing/reading to/from different chunks?
Shouldn't Stream.Position indicate where we stand in both cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my sanity mostly. This is hard.

Copy link
Member

@antonfirsov antonfirsov Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, so it's a special stream implementation where writes go always to the end of the stream, right? Would add a comment about this.

Edit: not seen on GH for some reason, but it was a reply to a previous discussion around this line.

@@ -39,14 +39,25 @@ public void Configuration_Stream_Agnostic()
[Fact]
public void NonSeekableStream()
{
var stream = new NoneSeekableStream(this.DataStream);
var stream = new NonSeekableStream(this.DataStream);
Copy link
Member

@antonfirsov antonfirsov Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are insufficient to validate ChunkedMemoryStream integration. They are only here to prove that different overloads of extension method calls propagate correctly to decoders. (this.DataStream contains only 16 bytes of fake data for the fake decoder!)

I would do two things to validate ChunkedMemoryStream:

  1. Add a couple of integration tests feeding a Jpegs and PNG-s through NonSeekableStream, make sure to cover different sizes.
  2. Create a disabled Xunit test for semi-manual local testing, that does the same test as 1. but iterates through all valid images under tests/Images/Input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied the existing Image.Identify and Image.IdentifyAsync to demonstrate accuracy. The unit tests also cover reading and writing across chunk boundaries.

Comment on lines +87 to +93
while (chunk != null)
{
MemoryChunk next = chunk.Next;
if (next != null)
{
length += chunk.Length;
}
Copy link
Member

@antonfirsov antonfirsov Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we cache the number of "fully written" chunks, the loop could be replaced by a multiplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look but the benchmarks showed no need for the optimization so chose not to complicate things.


int pos = 0;
MemoryChunk chunk = this.memoryChunk;
while (chunk != this.readChunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a trick similar I'm suggesting for Length. These properties are heavily used in our decoders as far as I can see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@JimBobSquarePants
Copy link
Member Author

@antonfirsov Thanks for the thorough review.

I've made a few changes based upon the feedback.

  • Added optimized read/write span methods and passed all reading/writing of buffers through those methods.
  • Increased the chunk size to 81920 bytes
  • Updated unit tests to demonstrate read/write over chunk boundaries
  • Added integration tests matching existing seekable tests.
  • Added benchmarks to demonstrate no lack of performance during decode.

@JimBobSquarePants JimBobSquarePants requested review from antonfirsov and a team August 19, 2020 14:42
Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the heavyweight integration test I suggested, and it fails for 65 out of 308 images.
We need to Skip the new theory, but add some of it's cases to our regular test suite. The lightweight Identify test is using the (simple) BMP decoder and doesn't really stress the new complex code we are about to add.

Regarding benchmarks:
I don't think they deliver good indicators in this particular case, since they are only showing a numbers about primitive operations being executed on small input streams.

We need to know: How many times do our decoders touch Length and Position for a typical huge image (I've seen couple of usages in PNG). If the number is high enough (Eg. bc. relevant methods are called in a loop), we need to proove the perf is not affected when decoding a large image.

@JimBobSquarePants
Copy link
Member Author

Odd that’s failing, will investigate. Re length and position, the buffered stream only calls that in the underlying stream when the buffer is exhausted so I wouldn’t consider that a hot path.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented Aug 19, 2020

I've just disabled the test on 32bit. They don't take much time so I think we should just keep them for now.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few wishes, otherwise looks good now!

Comment on lines +36 to +45
private MemoryChunk writeChunk;

// Offset into chunk to write to
private int writeOffset;

// Current chunk to read from
private MemoryChunk readChunk;

// Offset into chunk to read from
private int readOffset;
Copy link
Member

@antonfirsov antonfirsov Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, so it's a special stream implementation where writes go always to the end of the stream, right? Would add a comment about this.

Edit: not seen on GH for some reason, but it was a reply to a previous discussion around this line.

Image<Rgba32> expected;
try
{
expected = Image.Load<Rgba32>(testFileFullPath);
Copy link
Member

@antonfirsov antonfirsov Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's my code, but if we leave all tests, on this line I would really like to replace Image.Load with TestFile stuff now to make the tests at least a bit faster by utilizing it's caching mechanism. Would do it, but I'm on a train now, may try later, if you also lack time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it to use the provider.

@JimBobSquarePants JimBobSquarePants linked an issue Aug 20, 2020 that may be closed by this pull request
4 tasks
@JimBobSquarePants JimBobSquarePants merged commit d95a996 into master Aug 21, 2020
@JimBobSquarePants JimBobSquarePants deleted the js/fix-non-seekable-load branch August 21, 2020 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to load image from DeflateStream.
2 participants