-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(cpp) Support __ keywords #2173
Comments
// _Complex, __always_inline, etc.
var UNDERSCORE_KEYWORDS_RE = ["_[A-Z][a-z\\d_]+", "__[a-z\\d_]+"]
var CPP_KEYWORDS = {
keyword: 'int float while private char catch import module export virtual operator sizeof ' +
...
}
hljs.addKeywords(CPP_KEYWORDS["keyword"], UNDERSCORE_KEYWORDS_RE)
// or
hljs.addKeyword(CPP_KEYWORDS["keyword"], UNDESCORE_KEYWORD_RE, relevance: 1)
hljs.addKeyword(CPP_KEYWORDS["keyword"], DOUBLE_UNDESCORE_KEYWORD_RE, relevance: 2) I think I'd prefer something like the above (a mini API to let someone add keywords to the actual keywords). The would solve the problem of "not to depend on how keyword flattener works"... since we'd be free to change the keyword string format in the future (but how likely is that really?) just so long as we still supported The problem with saying "just use another matcher" is (other than having to change the file perfectly in like 10 places, and then worry about TESTING that those 10 changes all work properly) that matchers actually have very different behavior from keywords... keywords are always the last to match, match the "gaps", etc... I see lots of advantages to just piggybacking off the existing keyword system. Thoughts? |
Something like that would be OK, but I think it's a bit complicated than it looks.
So what about spaces in regex? And how about set some relevance to every keyword introduced by |
I addressed that:
It could even blow up, which would simply fail to pass tests... etc...
Pretty easy to do that with a for loop? These special cases should be one offs, this shouldn't be something needed everyday or very often. Though I'd be fine with the API taking an array as well...
Except the compiling happen much much later in the process... this is just the raw data file... |
What if we just allow it to take a string or array? var UNDERSCORE_KEYWORDS_RE = [/_[A-Z][a-z\d_]+/, /__[a-z\d_]+/]
var CPP_KEYWORDS = {
keyword: ['int float while private char catch import module export virtual operator sizeof ' +
'', ...UNDERSCORE_KEYWORDS_RE]
...
} That's newer syntax which I'd have to walk back, but you get the idea... so:
|
I'm worried we may be overthinking this. Let me think and dig into this a bit more. :) Think there may be a few things about how we do keywords I need to bone up on anyways. |
So seems keywords don't really seem to support regex anyways, so that one grammar is mistaken unless I'm reading something wrong. So I think right now the only way to do this is another constant and then repeat it over and over inside Tagging this beginner friendly as this should just be a matter of getting it done and writing some tests. |
cpp
in one fell swoop
Now that we have some precedent what about:
As a way of attaching/grouping modes that exist for no other purpose than to match keywords, just when the keyword is a dynamic match (regex) instead of a static list? $modes would simply be copied to the end of whatever mode's Just popped into my head and I thought I'd jot it down. |
Or perhaps something like keywords: {
$dynamic_patterns: {
[/_{1,2}[a-z\d_]+/, "keyword"]
] Or a new namespace/suffix: keywords: {
keyword$all: /_{1,2}[a-z\d_]+/, // className = "keyword"
builtin$all: /__(.*)__/, // className = "builtin"
keyword: "if when else done", |
+1 for the last option. Probably even keywords: {
$keyword: /_{1,2}[a-z\d_]+/,
keyword: "if when else done", |
Well, actually the intent when we added $pattern was to reserve the whole $ namespace for keyword configuration, not random values. So in the future you could imagine:
So $keyword MAYBE could work but not if you want to be able to name the resulting matches with className... I'm actually coming around to the idea of a keywords: {
keyword_re: /_{1,2}[a-z\d_]+/, // className = "keyword"
builtin_re: /__(.*)__/, // className = "builtin"
keyword: "if when else done",
My problem is that I don't think _re screams "wildcard" or 'match all'... we could use * but then we'd have to stringily the keys: keywords: {
'*keyword': /_{1,2}[a-z\d_]+/, // className = "keyword"
'*builtin': /__(.*)__/, // className = "builtin"
keyword: "if when else done", |
Maybe maybe just seeing the regex next to it makes I don't think we're there yet, but I like this avenue of exploration. @allejo Any thoughts? |
Yes, it's clear enough for me. I'm fine with |
I'm going to go back and review some of the grammars that make HEAVY use of modes for keywords and see if there is anything else we should keep in mind while working on this. My one thought so far is we don't have a great way to specify relevance. We could fall back on some helper functions: import { kw } from "keyword_helpers";
keywords: {
keyword_re: kw.matchAll(/_{1,2}[a-z\d_]+/, { relevance: 2 }) Although then I become temped to just make the input processing smarter: import { kw } from "keyword_helpers";
keywords: {
// __boo_hoo will be default to 1 relevance, bob 10, suzy 20, other _ identifiers 2
keyword: [
"bob|10 suzy|20",
/__boo.*__/,
kw.matchAll(/_{1,2}[a-z\d_]+/, { relevance: 2 })] |
For what it’s worth, double-underscore names are “reserved identifiers” in C++ but they are not keywords, and don’t generally act as such (though some implementations use such identifiers as non-standard language extensions). When reading C++ code I certainly wouldn’t expect them to be highlighted in the same way as keywords, nor even necessarily builtins (though this is certainly debatable). For one thing, the reason they are reserved is so that standard library implementors can use these names and, consequently, such implementations are peppered with them. When discussing standard library implementation code it would be extremely counter-productive to have them all displayed as keywords. For all intents and purposes they’re regular identifiers, they’re just illegal to use for “normal” users. Special-casing them is also arbitrary, because this is only part of the actual rule regarding reserved names, which was reproduced incorrectly in the related PR. The actual rule is:
Rule 3 can’t be implemented with lexical analysis alone. My recommendation is to leave all these names alone. |
Interesting. void static delayShort(uint16_t ms) __attribute__ ((noinline)); What is Scope: |
I'm totally open to just closing this, but also our goal has never been perfection. If we decided this was worthy of highlighting as "meta" something it wouldn't bother me that we got a few edge cases wrong... that's the cost of being a highlighter that doesn't fully understand language context. |
As far as standard C++ is concerned, nothing. Leaving the ivory tower for a bit, it’s a widely used compiler extension. I do think highlighting this is useful (and VS Code’s classification seems OK, if not technically correct: it’s not a macro). The issue with these constructs is that they differ between compilers (GCC and clang know Unfortunately I’m not aware of a comprehensive list of such identifiers. The closest may be this for MSVC and this for GCC (the clang list probably has lots of overlap here), and it’s not exactly helpful. Maybe we can crib them from other highlighters. |
The original filed issue was trying to add:
Are those three above common? Would you mind taking a stab at what you might consider the "common" list? If it's a short list I'd be happy to just add those manually and mark this done for now. :-) Do the common ones make sense to rise to "keyword" status, or would you still consider them built-ins?
I'm not sure how that's super relevant? I understand your original objection if they are not technically keywords... but say we decided to highlight them otherwise (say with |
I’m happy to collect these lists and submit a PR but I probably won’t have time before the weekend. As for the wildcard:
The problem with that is that it would highlight standard library implementation code very confusingly. For example, consider (part of) the libc++ implementation of __copy(_Tp* __first, _Tp* __last, _Up* __result)
{
const size_t __n = static_cast<size_t>(__last - __first);
if (__n > 0)
_VSTD::memmove(__result, __first, __n * sizeof(_Up));
return __result + __n;
} Highlighting all these regular identifiers as builtins or keywords is just confusing and counter-productive (and no other highlighter that I tried does it, including VS Code and Vim). Admittedly this only affects very specific projects since these identifiers are generally illegal, but discussions of standard library code is a non-negligible part of websites that could potentially use highlight.js, e.g. developer blogs and Stack Overflow. By contrast, as a C++ developer I don’t really have any expectation to see non-standard extensions treated specially, so if a This is supported by the lack of highlighting of these names on GitHub. Even VS Code only highlights int f()
__attribute__ ((noinline));
int f()
__attribute__((noinline)); |
Makes sense, guess I've only ever used a few of them (attribute, etc) and they always seemed pretty important to me... :-) and didn't realize the std library using
No rush. :-) Looking forward to seeing what ya got. |
Happy New Year. Still wanting to work on this one? |
Definitely. Apologies, I’ve had some other personal stuff interfere. I hope I’ll get some time to work on this soon. |
There is also #2954 which means C and C++ can be dealt with as separate things. |
Which weekend? ;-) I know, I know, life gets in the way... any chance perhaps you could at least handle the "collect a list" part and post it here or is that actually the hard part in your mind? :-) So far of everyone talking here you seem the most qualified to furnish a pseudo-official list of what "might be a good idea"... once we had a list someone else could come along and do the actual implementation. |
Stems from PR:
#2020
See the related discussion there.
It'd be really nice to find a solution to this without a separate mode. This needs a little more though IMHO to see if that's possible before just breaking down and doing that.
The text was updated successfully, but these errors were encountered: