Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some entities get wrongly encoded, when using entities_processNumerical=true #4941

Closed
agorum opened this issue Oct 27, 2021 · 5 comments · Fixed by #5306
Closed

Some entities get wrongly encoded, when using entities_processNumerical=true #4941

agorum opened this issue Oct 27, 2021 · 5 comments · Fixed by #5306
Assignees
Labels
plugin:entities The plugin which probably causes the issue. status:confirmed An issue confirmed by the development team. type:bug A bug.
Milestone

Comments

@agorum
Copy link

agorum commented Oct 27, 2021

Type of report

Bug

Provide detailed reproduction steps (if any)

Open JSFiddle to see the problem: https://jsfiddle.net/veyd1gjn/10/

There is an Emoji in there 👍 and the config.entities_processNumerical is set to true.

When switching to Sourcecode of the Editor and back, the Emoji is destroyed, showing ��

In Sourcecode view it is encoded to: "��"

The correct value would be "👍"

Maybe, it is because the number is larger than 0xFFFF ?

Expected result

The expected result, that the encoded entity is correctly encoded and decoded.

Actual result

The entity is destroyed. From 👍 to ��

Other details

  • Browser: Chrome
  • OS: Windows 10
  • CKEditor version: 4.5.0+
  • Installed CKEditor plugins: entities
@agorum agorum added the type:bug A bug. label Oct 27, 2021
@Comandeer
Copy link
Member

It seems that it's connected with the fact that we use String#charCodeAt() in the entities plugin instead of String#codePointAt() and a regex without the u flag. Due to that, the emoji is converted to its surrogate pair.

The quick workaround involves two changes inside the entities plugin:

  1. return config.entities_processNumerical == 'force' || !entitiesTable[ character ] ? '&#' + character.charCodeAt( 0 ) + ';'
    charCodeAt should be replaced by the codePointAt,
  2. entitiesRegex = new RegExp( entitiesRegex, 'g' );
    – the g flag should be replaced by the gu flags.

However, this fix works only if there is no need to support IEs. As CKEditor 4 supports these browsers, we need to think of a more proper fix that will work also there.

@agorum, I've updated your original report to contain info about the entities plugin.

@Comandeer Comandeer added size:? status:confirmed An issue confirmed by the development team. plugin:entities The plugin which probably causes the issue. labels Oct 28, 2021
@agorum
Copy link
Author

agorum commented Nov 1, 2021

Thank you very much, that did the trick!

Internet Explorer is not necessary for us anymore.

As a default fix, I would suggest to implement a config-option to turn on this behavior (with the warning, that IEs are not supported, when turning on this option). What do you think?

@Comandeer
Copy link
Member

It sounds reasonable. However, I'm not sure if simpler approach wouldn't be even better: check if codePointAt() method is available and use it, otherwise fall back to the older charCodAt() one; the same for u flag.

@agorum
Copy link
Author

agorum commented Nov 3, 2021

Sounds good!

@CKEditorBot
Copy link
Collaborator

Closed in #5306

@CKEditorBot CKEditorBot added this to the 4.19.2 milestone Aug 31, 2022
@jacekbogdanski jacekbogdanski modified the milestones: 4.19.2, 4.20.0 Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin:entities The plugin which probably causes the issue. status:confirmed An issue confirmed by the development team. type:bug A bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants