-
-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix non-seekable stream reading. #1316
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1316 +/- ##
==========================================
- Coverage 82.76% 82.74% -0.02%
==========================================
Files 689 689
Lines 30721 30933 +212
Branches 3473 3508 +35
==========================================
+ Hits 25427 25597 +170
- Misses 4587 4617 +30
- Partials 707 719 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is fairly complex stuff, good job! I'll need to have another round of review after understanding the stuff I'm currently missing.
Even thought it's an urgent and critical bugfix, I believe it's worth to invest a bit more time to get higher confidence this actually works, even if the cost is a delay of few days. For this, I'm suggesting to do an aggressive round of semi-manual integration testing.
Also: do we have test cases that verify reading/writing correctness, when the operation touches/crosses the boundaries of chunks?
Etc ...
} | ||
|
||
/// <inheritdoc/> | ||
public override int Read(byte[] buffer, int offset, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather implement the Span<byte>
overload, and call it from Read(byte[] buffer, int offset, int count)
. This would avoid an unnecessary copy done by the Stream
base class to/from rented buffers.
Same for Write
.
Note: Stream
does the other way around with a copy because of compatibility reasons (the Span<byte>
overload came later to the base class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually never call the Span<byte>
overload since everything is wrapped by the BufferedReadStream
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it seems we do use it actually:
i = baseStream.Read(buffer.Slice(n, count - n)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I realized that almost immediately after commenting. Was away from my computer so couldn't correct myself.
/// <summary> | ||
/// The default length in bytes of each buffer chunk. | ||
/// </summary> | ||
public const int DefaultBufferLength = 4096; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably go way higher, maybe even over 100K. Memory will be very fragmented with 4K. (1000 chunks for a 4MB image!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Went with 81920
private MemoryChunk writeChunk; | ||
|
||
// Offset into chunk to write to | ||
private int writeOffset; | ||
|
||
// Current chunk to read from | ||
private MemoryChunk readChunk; | ||
|
||
// Offset into chunk to read from | ||
private int readOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but why are we writing/reading to/from different chunks?
Shouldn't Stream.Position
indicate where we stand in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my sanity mostly. This is hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, so it's a special stream implementation where writes go always to the end of the stream, right? Would add a comment about this.
Edit: not seen on GH for some reason, but it was a reply to a previous discussion around this line.
@@ -39,14 +39,25 @@ public void Configuration_Stream_Agnostic() | |||
[Fact] | |||
public void NonSeekableStream() | |||
{ | |||
var stream = new NoneSeekableStream(this.DataStream); | |||
var stream = new NonSeekableStream(this.DataStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are insufficient to validate ChunkedMemoryStream
integration. They are only here to prove that different overloads of extension method calls propagate correctly to decoders. (this.DataStream
contains only 16 bytes of fake data for the fake decoder!)
I would do two things to validate ChunkedMemoryStream
:
- Add a couple of integration tests feeding a Jpegs and PNG-s through
NonSeekableStream
, make sure to cover different sizes. - Create a disabled Xunit test for semi-manual local testing, that does the same test as 1. but iterates through all valid images under
tests/Images/Input
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've copied the existing Image.Identify
and Image.IdentifyAsync
to demonstrate accuracy. The unit tests also cover reading and writing across chunk boundaries.
while (chunk != null) | ||
{ | ||
MemoryChunk next = chunk.Next; | ||
if (next != null) | ||
{ | ||
length += chunk.Length; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cache the number of "fully written" chunks, the loop could be replaced by a multiplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look but the benchmarks showed no need for the optimization so chose not to complicate things.
|
||
int pos = 0; | ||
MemoryChunk chunk = this.memoryChunk; | ||
while (chunk != this.readChunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a trick similar I'm suggesting for Length
. These properties are heavily used in our decoders as far as I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
@antonfirsov Thanks for the thorough review. I've made a few changes based upon the feedback.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed the heavyweight integration test I suggested, and it fails for 65 out of 308 images.
We need to Skip
the new theory, but add some of it's cases to our regular test suite. The lightweight Identify
test is using the (simple) BMP decoder and doesn't really stress the new complex code we are about to add.
Regarding benchmarks:
I don't think they deliver good indicators in this particular case, since they are only showing a numbers about primitive operations being executed on small input streams.
We need to know: How many times do our decoders touch Length
and Position
for a typical huge image (I've seen couple of usages in PNG). If the number is high enough (Eg. bc. relevant methods are called in a loop), we need to proove the perf is not affected when decoding a large image.
Odd that’s failing, will investigate. Re length and position, the buffered stream only calls that in the underlying stream when the buffer is exhausted so I wouldn’t consider that a hot path. |
I've just disabled the test on 32bit. They don't take much time so I think we should just keep them for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few wishes, otherwise looks good now!
private MemoryChunk writeChunk; | ||
|
||
// Offset into chunk to write to | ||
private int writeOffset; | ||
|
||
// Current chunk to read from | ||
private MemoryChunk readChunk; | ||
|
||
// Offset into chunk to read from | ||
private int readOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, so it's a special stream implementation where writes go always to the end of the stream, right? Would add a comment about this.
Edit: not seen on GH for some reason, but it was a reply to a previous discussion around this line.
Image<Rgba32> expected; | ||
try | ||
{ | ||
expected = Image.Load<Rgba32>(testFileFullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's my code, but if we leave all tests, on this line I would really like to replace Image.Load
with TestFile
stuff now to make the tests at least a bit faster by utilizing it's caching mechanism. Would do it, but I'm on a train now, may try later, if you also lack time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it to use the provider.
Prerequisites
Description
Fixes our non-seekable stream loading which was attempting to use the non-supported
Length
property.ImageSharp/src/ImageSharp/Image.FromStream.cs
Line 734 in 8e2792e
https://source.dot.net/#System.Private.CoreLib/Stream.cs,51
The fix utilizes an internal
ChunkedMemoryStream
implementation to allow pooling of byte buffers via the use of non-contiguous chunks. This let's us allocate chunks from the buffer pool on demand upon read.