Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method for URL-safe project serialization #102

Open
aomarks opened this issue Feb 5, 2021 · 6 comments
Open

Method for URL-safe project serialization #102

aomarks opened this issue Feb 5, 2021 · 6 comments

Comments

@aomarks
Copy link
Member

aomarks commented Feb 5, 2021

It's not as easy as you'd think to serialize the state of a project for e.g. sticking in a URL, because btoa throws on non-Latin-1 characters, so special handling there is required. Also, base64url is a better scheme (- and _ instead of + and /), because it's less likely to get mis-encoded somehow (especially the + which will become a space).

Let's add something like serialize() and deserialize() to project and ide, to make this use case simpler.

@aomarks aomarks self-assigned this Feb 5, 2021
@aomarks aomarks moved this to Todo in Lit Project Board Jan 24, 2022
@aomarks aomarks moved this from 🔥 Front Burner to 🧊 Icebox in Lit Project Board Mar 15, 2022
@aomarks aomarks removed their assignment Jun 16, 2022
@aomarks aomarks moved this from 🧊 Icebox to 📋 Triaged in Lit Project Board Jun 16, 2022
@maelp
Copy link

maelp commented Feb 1, 2024

how come btoa throws on non-latin characters? isn't it supposed to handle general bytes? therefore encoding anything, whatever the encoding?

@aomarks
Copy link
Member Author

aomarks commented Feb 1, 2024

how come btoa throws on non-latin characters? isn't it supposed to handle general bytes? therefore encoding anything, whatever the encoding?

btoa does handle general bytes, but it's not smart enough to know how to encode a multi-byte character if it encounters one while iterating over a JS string (since JS strings are variable length UTF-16). When it gets to a multi-byte UTF-16 character, it sees it as a number that is too large to fit in a byte, and errors. So, the easiest solution is to convert the string to UTF-8 first.

@aomarks
Copy link
Member Author

aomarks commented Feb 1, 2024

how come btoa throws on non-latin characters? isn't it supposed to handle general bytes? therefore encoding anything, whatever the encoding?

btoa does handle general bytes, but it's not smart enough to know how to encode a multi-byte character if it encounters one while iterating over a JS string (since JS strings are variable length UTF-16). When it gets to a multi-byte UTF-16 character, it sees it as a number that is too large to fit in a byte, and errors. So, the easiest solution is to convert the string to UTF-8 first.

https://developer.mozilla.org/en-US/docs/Glossary/Base64#the_unicode_problem

@maelp
Copy link

maelp commented Feb 1, 2024

Thanks for the details! I feel there should be a warning somewhere in the API since this is so common in browser code! I'd have expected any string given to btoa to be converted to a Uint8Array before being converted to base64 to avoid any encoding issue...

@maelp
Copy link

maelp commented Feb 1, 2024

BTW I see many "solutions" on the web mentioning the use of a TextEncoder to convert everything to UTF-8 before doing the base64 conversion, why isn't the version chosen for https://github.com/lit/lit.dev/blob/fd4c34e71b47267f3672a2debe52807042f22cc2/packages/lit-dev-content/src/pages/playground.ts#L31 ?

@aomarks
Copy link
Member Author

aomarks commented Feb 1, 2024

BTW I see many "solutions" on the web mentioning the use of a TextEncoder to convert everything to UTF-8 before doing the base64 conversion, why isn't the version chosen for https://github.com/lit/lit.dev/blob/fd4c34e71b47267f3672a2debe52807042f22cc2/packages/lit-dev-content/src/pages/playground.ts#L31 ?

I don't know, we should probably use it! I thought maybe TextEncoder wasn't available in all browsers at the time it was written, but that doesn't seem to be true. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants