Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 encoded extra characters #35

Closed
XavRsl opened this issue Apr 20, 2017 · 6 comments
Closed

UTF8 encoded extra characters #35

XavRsl opened this issue Apr 20, 2017 · 6 comments

Comments

@XavRsl
Copy link

XavRsl commented Apr 20, 2017

Hi !

I'm using this library to save user provided content to a database (utf8mb4 encoded field). The content is added by the user to a textarea that is then converted from markdown text to html using the markdown-it library.
I've been struggling to find out why the images included in my html didn't show up in my browser when their urls seemed to be right. Here's what I found out :

  • using JoliTypo, the content in the database, converted to ISO 8859-1 looks like this :
<p><a href="http://pubpeer.dev/stor­age/image-1492678775687.jpg" target="_self"><img src="http://pubpeer.dev/stor­age/image-1492678775687.jpg" alt="file"></a></p>
  • when I don't use JoliTypo :
<p><a href="http://pubpeer.dev/stor­age/image-1492678775687.jpg" target="_self"><img src="http://pubpeer.dev/stor­age/image-1492678775687.jpg" alt="file"></a></p>

So it seems that JoliTypo is adding UTF... characters in my content. Am I doing something wrong or is there a bug somewhere ?

Thanks,

Xavier

@damienalexandre
Copy link
Member

Hi!

Your issue is strange but when I copy & paste your second example to my terminal, I see a SOFT HYPHEN in the word "storage":

image

  • Did you past the real string or did you applyed the correction from the first one?
  • Do you have "Hyphen" in the JoliTypo configuration? It should NOT apply hyphenation on attribute values but it seems there is one hyphen in it already...

@XavRsl
Copy link
Author

XavRsl commented Apr 22, 2017

Hi,

In fact, to see the UTF8 characters I pasted the string in Sublime text and changed the encoding. Didn't know about the uniscribe command (looks very useful !). So I probably pasted the second string from the UTF8 with the soft hyphen. I do have Hyphen in my configuration. I'll try to turn it off. Don't know why but it does apply to attributes...

Thanks,

Xavier

@damienalexandre
Copy link
Member

If removing the Hyphen fixer from your configuration fix the issue, could you provide the full string you are passing to JoliTypo so I can look into it?

JoliTypo does not modify attributes but a bug in the parsing / DomDocument can still happen.

Thanks!

@XavRsl
Copy link
Author

XavRsl commented Apr 24, 2017

Hi,
Yes, getting rid of the Hyphen fixer did fix the bug. Not sure it's going to help you much but here is the string that was sent to the fixer :

<p><a href="http://pubpeer.dev/storage/image-1493026187479.gif" target="_self"><img src="http://pubpeer.dev/storage/image-1493026187479.gif" alt="file"></a></p>

As a side note, uniscribe looks very powerful, but I didn't find a way to install it on my mac. Any hint ?

Thanks,

Xavier

@damienalexandre
Copy link
Member

Thanks for the snippet, I will look into that and try to reproduce the issue.

As for Uniscribe, it's a Ruby Gem, so you need to install Ruby first. See https://github.com/janlelis/uniscribe

👋

@damienalexandre
Copy link
Member

Ok so I tried to reproduce and I don't have unexpected hyphen on your sample string. I'm closing this for now, feel free to open a new issue if this problem affect you.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants