-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce and use Indexed BufferReader methods #402
Conversation
if (values.Length < 2 + 2) | ||
{ | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length is validated upfront, so we don't have to worry about an IOException below.
if (bytes.Length < 4 + 8) | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length is validated upfront, to prevent an IOException later.
MetadataExtractor/IO/BufferReader.cs
Outdated
return new StringValue(bytes, encoding); | ||
} | ||
|
||
public byte[] GetNullTerminatedBytes(int maxLengthBytes, bool moveToMaxLength = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be able to return a ReadOnlySpan here, instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, with the consequence that StringValue will have to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd still need to use ToArray when an array is actually needed, but there's a few cases where we would be able to utilize a span directly.
For instance, here:
var bytes = GetNullTerminatedBytes(index, maxLengthBytes);
return (encoding ?? Encoding.UTF8).GetString(bytes);
I did some exploration on whether to change StringValue to accept ReadOnlyMemory, which could eliminate some allocations at the risk of rooting larger byte[] segments. I'm not sure if it's worth the trade-off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer term I can imagine a lot more use of ReadOnlyMemory<byte>
and/or ReadOnlySequence<byte>
throughout the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just one minor regression to fix up and a few comments and this can go in.
MetadataExtractor/IO/BufferReader.cs
Outdated
return new StringValue(bytes, encoding); | ||
} | ||
|
||
public byte[] GetNullTerminatedBytes(int maxLengthBytes, bool moveToMaxLength = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer term I can imagine a lot more use of ReadOnlyMemory<byte>
and/or ReadOnlySequence<byte>
throughout the library.
var reader = new SequentialByteArrayReader(bytes); | ||
var reader = new BufferReader(bytes, isBigEndian: true); | ||
var keywordStringValue = reader.GetNullTerminatedStringValue(maxLengthBytes: 79); | ||
var keyword = keywordStringValue.ToString(_utf8Encoding); | ||
var keyword = keywordStringValue.ToString(Encoding.UTF8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another example where GetNullTerminated...
could return a span.
(Not advocating one way or another, just gathering data.)
MetadataExtractor/IO/BufferReader.cs
Outdated
// The number of non-null bytes | ||
int length; | ||
|
||
byte[] buffer; | ||
|
||
if (moveToMaxLength) | ||
{ | ||
buffer = GetBytes(maxLengthBytes); | ||
length = Array.IndexOf(buffer, (byte)'\0') switch | ||
{ | ||
-1 => maxLengthBytes, | ||
int i => i | ||
}; | ||
} | ||
else | ||
{ | ||
buffer = new byte[maxLengthBytes]; | ||
length = 0; | ||
|
||
while (length < buffer.Length && (buffer[length] = GetByte()) != 0) | ||
length++; | ||
} | ||
|
||
if (length == 0) | ||
return []; | ||
if (length == maxLengthBytes) | ||
return buffer; | ||
var bytes = new byte[length]; | ||
if (length > 0) | ||
Array.Copy(buffer, bytes, length); | ||
return bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now lives in two places. I wonder if we can avoid that duplication somehow. Nothing jumps out at me though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few allocations here that can be killed as well. On my list to be refactored in a another PR (along with returning a ReadOnlySpan). I'll see if we can share any of that logic with the other implementation when I do.
Nice! I looked at that on the weekend and from memory it was blocked on the TGA checker that needed indexed access. Great that we can spanify it too. I've been looking at trying to consolidate the THREE different approaches we have for reading ISO BMFF data. We have quite similar code for MP4, QuickTime and HEIF data formats. They're all different enough that it's not so straightforward, but I'm hopeful the end result will be good. |
I ran a trace over the regression suite, using .NET 8 in release mode, with only the .NET runner, which reads all the input files and writes out metadata text files. The run used 9.92 seconds of CPU. We spent 135 ms in the GC, which is < 2%. Pretty good! Interestingly there are 22 gen 2 collections. Some of these are the LOH which would be good to investigate as ideally we wouldn't be allocating anything on the LOH. I suspect they're mostly byte arrays, but will investigate: I also see some finalizers running for reflection emit, which I find surprising. I'll try and track those down. |
This PR introduces indexed accessors on the Buffer reader, and eliminates another batch of allocations.