-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Conversation
if (typeof TextEncoder !== 'undefined' && typeof TextDecoder !== 'undefined') { | ||
utf8Encoder = new TextEncoder(); | ||
utf8Decoder = new TextDecoder(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a review note: have added utf8 tests to the headless browser suite and manually verified code passes through this block. (The browser tests don't get captured in the coverage report).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cag!
In #3441 you're hoping to stop getPastEvents
from crashing ... is there a way to verify this PR achieves that?
Log decoding is being done here via the Ethers library and reading through the thread in #1610 (esp. this comment) makes me wonder if there isn't something which has to be addressed at that layer.
Do you have a view about this?
(Apologies in advance if I haven't understood this problem correctly.)
@cgewecke You're right! It does have to be addressed in |
my two cents: I would prefer to use the ethers utf8 coder.. I already spoke about it with @ricmoo and he has done a great job with it. #usingOfSynergies |
Heya! Just saw this. There are several included replacement strategies included in the ethers UTF8 library. If allowing non-strict parsing, you probably want some way for the user to specify this, since critical security issues can be introduced (as well as lost funds, asset destruction, etc) can occur by fuzzing a string. Some quick examples:
My 2 cents are that, it does indeed suck that strings are so finicky in this environment. But there are good reasons they are. Similar to why we have to use big number, i.e. if you don’t, incorrect results occur, the same is true for strings that aren’t UTF8; if you don’t enforce it incorrect results occur. It just feels like strings aren’t as important, but they are. You won’t want numbers to just “ignore” errors that occurred. You won’t want a Swedish user typing in "1,337" ether to send 1337 ether, you’d want it to fail (barring localized parsing). One option, might be once proxies are more widely accepted would be to make that property throw on a read, which would allow people to trap it if they can about the value, otherwise it can just silently and safely pass through the cracks. Maybe in the mean time there should be a wrapping object that indicates the code units are invalid? I can add an error type to the UTF8 decoder for the final wrapping to make this easier. Ideas? |
@nivida My only horse in the race is that there is a way for getPastEvents to not crash in these scenarios, or to give a way to recover from errors and continue processing. If that means using ethers.js, that's fine by me. There's a spec for the codec, so that's what I'll refer to for the rest of this comment, assuming (maybe wrongly?) that this spec is faithfully and securely implemented by the various JS engines.
I should hope people would find
By default, the decoder will attempt to decode the bytes as UTF-8. There's no format inference in the spec. Also, by default, the encoder will encode code points as UTF-8.
The decoder behavior under error conditions is as follows:
Anyway, it's up to the browser to implement the spec correctly.
I agree that this is something that can affect dapp security.
Referring to Add an option, acknowledging that parsing event data might fail due to string processing, maybe called In particular, |
I'm closing this, as it is clear to me that these changes won't fix #1610. Probably the design change proposal in the comment above should go in the issue thread instead of here. |
I agree, but I think the consumer of events are often scripts, which may not be so discerning. ;) But yes, the IGNORE, REPLACE and ERROR strategies are all available in ethers. I'm also not against the TextCoders, I just worry that invalid UTF-8 characters can be a signifiant security exploit, some there should be some sort of alarms raised.
Basically this, 100%. :)
I'm curious what it does in the face of overlong representations... I'll have to look it up and/or experiment. They greatly impact dapp security, but in most day-to-day lives, shouldn't matter much, so I'm curious what the spec suggests... |
Description
Use TextEncoder to encode UTF-8 sequences.
Related: #1610 (Edit: Thought this would fix until @cgewecke noted that getPastEvents uses a different UTF-8 codec)
Related: #3441
Type of change
Checklist:
npm run dtslint
with success and extended the tests and types if necessary.npm run test:unit
with success.npm run test:cov
and my test cases do cover all lines and branches of the added code.npm run build-all
and tested the resulting file/'s fromdist
folder in a browser.CHANGELOG.md
file in the root folder.