-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RangeError: premature EOF for Unicode character U+FEFF on start #342
Comments
The text encoding/decoding is entirely up to your JS runtime. You can run this in your browser or node and see the same result. const input = 'test'; // "\ufefftest\ufeff"
const encoder = new TextEncoder();
const decoder = new TextDecoder('utf-8', { fatal: true });
const output = decoder.decode(encoder.encode(input)); // "test\ufeff" I don't know for certain if this is expected for UTF-8/protobuf or a bug. |
Oh, duh. It's the BOM.
Using const input = 'test'; // "\ufefftest\ufeff"
const encoder = new TextEncoder();
const decoder = new TextDecoder('utf-8', { fatal: true, ignoreBOM: true });
const output = decoder.decode(encoder.encode(input)); // "\ufefftest\ufeff" |
Thanks for the heads up! Do you think setting this option to true will merge into main? Since if I have analyzed the code correctly, TextEncoder instantiated on the fly within the library.
Or is there another solution you are pointing out which I'm not seeing? Like if I can set the TextDecoder used in library somehow. |
Thought I can do this:
But doing this everywhere |
I can't seem to find any information about whether or not the BOM should be ignored or not in protobuf strings. They're definitely ignored when I think the correct thing would be to update protobuf-ts to use the // shared-binary-read-options.ts
import { BinaryReader, BinaryReadOptions } from '@protobuf-ts/runtime';
const textDecoder = new TextDecoder('utf-8', { fatal: true, ignoreBOM: true });
export const binaryReadOptions: Partial<BinaryReadOptions> = {
readerFactory: (bytes) => new BinaryReader(bytes, textDecoder)
}; // some-other-file.ts
import { binaryReadOptions } from './shared-binary-read-options';
// Later
Test.fromBinary(data, binaryReadOptions); I wouldn't recommend the following approach, but you can monkey-patch the // monkey-patch-protobuf-ts-binary-reader.ts
import { BinaryReader } from '@protobuf-ts/runtime';
const textDecoder = new TextDecoder('utf-8', { fatal: true, ignoreBOM: true });
function monkeyPatchedString(): string {
return textDecoder.decode(this.bytes());
}
// @ts-ignore
BinaryReader.prototype.string = monkeyPatchedString; |
Thanks to both of you for looking into this!
Keeping the byte order mark in the decoded string seems reasonable to me, if only for reproducible encoding roundtrips. If this passes the conformance suite, it should be fine. However, I'm curious about the error being thrown. @kivancguckiran, can you provide some details? If I run your example, I see that the BOM is stripped, but I don't see an error:
|
Hello @timostamm! You are absolutely right. This does not raise an error. I was trying to debug it inside node_modules at that time with So no errors at all. But yes, the first BOM is omitted as you have noticed. We are currently wrapping readerFactory like this to workaround the issue:
Like suggested by @jcready. It would be nice to see this in Thanks. |
We pass the conformance tests with |
Released in v2.8.0. |
Hello,
We've noticed that whenever the Unicode ZWNBSP character(U+FEFF) is received on start of the string of a message, it throws a silent error and omits the character in the decoded part. It seems that, this particular execution of TextDecoder.decode throws the mentioned error:
protobuf-ts/packages/runtime/src/binary-reader.ts
Line 245 in aaa63c7
I've created a repository to reproduce:
https://github.com/kivancguckiran/premature-eof-protobuf-ts
Outputted the charcodes from the result of the
create
operation and afterfromBinary
operation. If the U+FEFF character is in the start, it is ommited from decoded part.Since it is Zero-Width-No-Break-Space, github preview hides the mentioned unicode character.
This line is:
https://github.com/kivancguckiran/premature-eof-protobuf-ts/blob/main/index.ts#L4
Actually like this:
![resim](https://user-images.githubusercontent.com/9072047/176771266-9c6d3153-e3ab-44d8-b602-cb99c214dbd1.png)
Thanks in advance.
The text was updated successfully, but these errors were encountered: