Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ Syntax Misunderstands wchar_t Character Literals #1412

Closed
wyattallen opened this issue Jan 11, 2017 · 8 comments
Closed

C++ Syntax Misunderstands wchar_t Character Literals #1412

wyattallen opened this issue Jan 11, 2017 · 8 comments
Labels

Comments

@wyattallen
Copy link

When defining a wchar_t character literal (for example L'#') HLJS 9.5.0 gets stuck incorrectly highlighting the following lines as part of a string. Observe the following C++ program.

#include<stdio.h>
void main() {
    char x = 'x';
    printf(L"#");
    printf(L'#');
    printf("Hello, world");
}

The second ' in L'#' starts a string that consumes the remainder of the file. HLJS does not have this problem with the wchar_t string form (for example L"#").

@joshgoebel
Copy link
Member

If you have a PR to resolve this that'd be great.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 4, 2019

@egor-rogov I could probably make a PR for this one, but I'm confused at the single quote syntax... which is shared with dts and elm.js:

Ref:
https://github.com/highlightjs/highlight.js/blob/master/src/languages/cpp.js#L25

  var STRINGS = {
    className: 'string',
    variants: [
      {
        begin: '(u8?|U|L)?"', end: '"',
        illegal: '\\n',
        contains: [hljs.BACKSLASH_ESCAPE]
      },
      { begin: /(?:u8?|U|L)?R"([^()\\ ]{0,16})\((?:.|\n)*?\)\1"/ },
      {
        begin: '\'\\\\?.', end: '\'',
        illegal: '.'
      }

I don't understand what illegal: '.' is doing or what its goal is. Isn't "." going to match anything? Why do we need an illegal match here at all? So without understanding it I'm afraid to make a PR. I did follow the Blame history back as far as I could hoping for a clue but didn't find anything promising.

And do you have any idea why the single quoted one has a possible \ match and then must match one character into the string, while the double quoted version doesn't follow that pattern at all?

IE, why wouldn't this work:

begin: '(u8?|U|L)?\'', end: '\'',

@egor-rogov
Copy link
Collaborator

@yyyc514 Honestly I have no idea. I guess it reflects the fact that in C/C++ character literal can usually be 'a' (character itself) or '\n' (escape sequence) or '\0' (ascii code), but looks it can be done other way.
I wouldn't be afraid, just add enough tests for single-quote literals.

@joshgoebel
Copy link
Member

guess it reflects the fact that in C/C++ character literal can usually be 'a' (character itself) or '\n' (escape sequence) or '\0' (ascii code),

Right, but that would be dealt with by BACKSLASH_ESCAPE (I presume)... this only matches a single \ at the very beginning of the string. I'll play around with it and see if doing it a more obvious ways breaks anything.

I wouldn't be afraid, just add enough tests for single-quote literals.

In there a place in the test system for more singular tests about specific features like this? I've mostly just played with the auto-detection tests so far.

@joshgoebel
Copy link
Member

@wyattallen Do single quotes support all the same fancy things that double quotes do? Do you have a quick link or information on the differences so i can make sure I do this as well as possible if I'm going to fix?

See above for our current "rules" for this in C++... evidently literal new lines are invalid inside double quoted strings... is that also true for single quoted?

@egor-rogov
Copy link
Collaborator

In there a place in the test system for more singular tests about specific features like this? I've mostly just played with the auto-detection tests so far.

Sure. test/markup/cpp/string-literals.txt has some tests already.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 6, 2019

My pass at this one:

#2156

Just replaces that insanity with a matching matcher for single quoted strings. It does pass all the tests and should be improved in that previously I don't see how it would have properly detected the end of the string. You could have tricked it with a \'.

@joshgoebel
Copy link
Member

This is merged now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants