Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintentional escape characters in odoc output (html) #620

Closed
jordwalke opened this issue Mar 3, 2021 · 6 comments · Fixed by #638
Closed

Unintentional escape characters in odoc output (html) #620

jordwalke opened this issue Mar 3, 2021 · 6 comments · Fixed by #638

Comments

@jordwalke
Copy link

In this odoc output for Unix, look at the function val link.
Here's what it looks like:

val link : ?⁠follow:bool -> string -> string -> unit

but copy and paste it into your clipboard, and assign it into a string in JavaScript, and look at the character position where f in "follow" should be.

var text = "val link : ?⁠follow:bool -> string -> string -> unit"
alert(text.charCodeAt(12));

It prints the char code 8288. This means there's some non printable char codes which make copy/pasting items from the page into utop or a file frustratingly fail to compile (you'll get errors about weird characters but can't see with your eyes where they are until you inspect byte by byte).

I'm assuming there was some reason why these were in there? Does anyone know the back story there?

@dbuenzli
Copy link
Contributor

dbuenzli commented Mar 3, 2021

I'm assuming there was some reason why these were in there?

That's U+2060, it was a failed attempt at controlling breaking behaviour. It seems they were never removed in the end. They should be.

@jordwalke
Copy link
Author

Cool. I also found some other ones in there. (160 - non breaking space)

@dbuenzli
Copy link
Contributor

dbuenzli commented Mar 3, 2021

(160 - non breaking space)

But are these problematic for c&p ? I tested the other day and they were not on my side.

I was actually planning to add more, since inline-blocking spans which made a very good job unfortunately has usability issues.

@jordwalke
Copy link
Author

Yes, in my case they were (but it was parsing the ocaml signature programmatically - perhaps there's something in the way I'm using the ocaml parser?). Do you expect ocaml to accept non-breaking spaces in ocaml files (non comment regions at least).

@dbuenzli
Copy link
Contributor

dbuenzli commented Mar 3, 2021

The challenge is to have reasonably readable and responsive signatures while not breaking cut and paste.

I don't know exactly why you are parsing OCaml out of these HTML files which are aimed at presentation. But if you are doing this it shouldn't be too hard to filter out the nbsp characters outside from character literals before feeding them to the OCaml parser.

Do you expect ocaml to accept non-breaking spaces in ocaml files (non comment regions at least).

No.

@jordwalke
Copy link
Author

I agree it's reasonable to require anyone using the html output as a kind of input to another system to strip the characters (which is what I did and it's not hard) but copy/pasting is the more important use case.

dbuenzli added a commit to dbuenzli/odoc that referenced this issue Mar 12, 2021
It breaks cut and paste in the HTML renderer and other
renderers have to ignore it.

Closes ocaml#620.
jonludlam pushed a commit that referenced this issue Mar 12, 2021
It breaks cut and paste in the HTML renderer and other
renderers have to ignore it.

Closes #620.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants