Skip to content

[C] Diagnose use of C++ keywords in C #137234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

AaronBallman
Copy link
Collaborator

@AaronBallman AaronBallman commented Apr 24, 2025

This adds a new diagnostic group, -Wc++-keyword, which is off by default and grouped under -Wc++-compat. The diagnostic catches use of C++ keywords in C code.

This change additionally fixes an issue with -Wreserved-identifier not diagnosing use of reserved identifiers in function parameter lists in a function declaration which is not a definition.

Partially fixes #21898

This adds a new diagnostic group, -Widentifier-is-c++-keyword, which is
off by default and grouped under -Wc++-compat. The diagnostic catches
use of C++ keywords in C code.

This change additionally fixes an issue with -Wreserved-identifier not
diagnosing use of reserved identifiers in function parameter lists.

Partially fixes llvm#21898
@AaronBallman AaronBallman added clang Clang issues not falling into any other category c clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 24, 2025

@llvm/pr-subscribers-clang

Author: Aaron Ballman (AaronBallman)

Changes

This adds a new diagnostic group, -Widentifier-is-c++-keyword, which is off by default and grouped under -Wc++-compat. The diagnostic catches use of C++ keywords in C code.

This change additionally fixes an issue with -Wreserved-identifier not diagnosing use of reserved identifiers in function parameter lists in a function declaration which is not a definition.

Partially fixes #21898


Patch is 25.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137234.diff

7 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+5)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+2-1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+3)
  • (modified) clang/include/clang/Basic/IdentifierTable.h (+8-3)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+27)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+31)
  • (added) clang/test/Sema/c++-keyword-in-c.c (+213)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index d1f24fb23d44d..453bdc9870bdb 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -143,6 +143,9 @@ C Language Changes
 - Added ``-Wimplicit-void-ptr-cast``, grouped under ``-Wc++-compat``, which
   diagnoses implicit conversion from ``void *`` to another pointer type as
   being incompatible with C++. (#GH17792)
+- Added ``-Widentifier-is-c++-keyword``, grouped under ``-Wc++-compat``, which
+  diagnoses when a C++ keyword is used as an identifier in C. Partially
+  addresses #GH21898.
 
 C2y Feature Support
 ^^^^^^^^^^^^^^^^^^^
@@ -416,6 +419,8 @@ Improvements to Clang's diagnostics
 - ``-Winitializer-overrides`` and ``-Wreorder-init-list`` are now grouped under
   the ``-Wc99-designator`` diagnostic group, as they also are about the
   behavior of the C99 feature as it was introduced into C++20. Fixes #GH47037
+- ``-Wreserved-identifier`` now fires on reserved parameter names in a function
+  declaration which is not a definition.
 
 Improvements to Clang's time-trace
 ----------------------------------
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index 6441b8049ed8d..84e590a857398 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -155,8 +155,9 @@ def C99Compat : DiagGroup<"c99-compat">;
 def C23Compat : DiagGroup<"c23-compat">;
 def : DiagGroup<"c2x-compat", [C23Compat]>;
 
+def CppKeywordInC : DiagGroup<"identifier-is-c++-keyword">;
 def ImplicitVoidPtrCast : DiagGroup<"implicit-void-ptr-cast">;
-def CXXCompat: DiagGroup<"c++-compat", [ImplicitVoidPtrCast]>;
+def CXXCompat: DiagGroup<"c++-compat", [ImplicitVoidPtrCast, CppKeywordInC]>;
 def ExternCCompat : DiagGroup<"extern-c-compat">;
 def KeywordCompat : DiagGroup<"keyword-compat">;
 def GNUCaseRange : DiagGroup<"gnu-case-range">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 8ff170520aafe..50a960313349a 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -496,6 +496,9 @@ def warn_unused_lambda_capture: Warning<"lambda capture %0 is not "
   "%select{used|required to be captured for this use}1">,
   InGroup<UnusedLambdaCapture>, DefaultIgnore;
 
+def warn_identifier_is_cpp_keyword : Warning<
+  "identifier %0 conflicts with a C++ keyword">,
+  InGroup<CppKeywordInC>, DefaultIgnore;
 def warn_reserved_extern_symbol: Warning<
   "identifier %0 is reserved because %select{"
   "<ERROR>|" // ReservedIdentifierStatus::NotReserved
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 1275b056227b5..05c989c1b07d9 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -444,13 +444,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
   }
   bool isCPlusPlusOperatorKeyword() const { return IsCPPOperatorKeyword; }
 
-  /// Return true if this token is a keyword in the specified language.
+  /// Return true if this identifier uses a keyword token and is a keyword in
+  /// the specified language.
   bool isKeyword(const LangOptions &LangOpts) const;
 
-  /// Return true if this token is a C++ keyword in the specified
-  /// language.
+  /// Return true if this identifier uses a keyword token and is a C++ keyword
+  /// in the specified language.
   bool isCPlusPlusKeyword(const LangOptions &LangOpts) const;
 
+  /// Returns true if the name of this identifier matches a keyword given the
+  /// specified language options.
+  bool isNameKeyword(const LangOptions &LangOpts) const;
+
   /// Get and set FETokenInfo. The language front-end is allowed to associate
   /// arbitrary metadata with this token.
   void *getFETokenInfo() const { return FETokenInfo; }
diff --git a/clang/lib/Basic/IdentifierTable.cpp b/clang/lib/Basic/IdentifierTable.cpp
index 16151c94464f9..412f0af40861a 100644
--- a/clang/lib/Basic/IdentifierTable.cpp
+++ b/clang/lib/Basic/IdentifierTable.cpp
@@ -343,6 +343,14 @@ static KeywordStatus getTokenKwStatus(const LangOptions &LangOpts,
   }
 }
 
+static KeywordStatus getNameKwStatus(const LangOptions &LangOpts,
+                                     StringRef Name) {
+  return llvm::StringSwitch<KeywordStatus>(Name)
+#define KEYWORD(NAME, FLAGS) .Case(#NAME, getKeywordStatus(LangOpts, FLAGS))
+#include "clang/Basic/TokenKinds.def"
+      .Default(KS_Disabled);
+}
+
 /// Returns true if the identifier represents a keyword in the
 /// specified language.
 bool IdentifierInfo::isKeyword(const LangOptions &LangOpts) const {
@@ -355,6 +363,25 @@ bool IdentifierInfo::isKeyword(const LangOptions &LangOpts) const {
   }
 }
 
+bool IdentifierInfo::isNameKeyword(const LangOptions &LangOpts) const {
+  // This differs from IdentifierInfo::isCPlusPlusKeyword(). That function
+  // tests if the identifier is a keyword token in C++ mode and then isn't a
+  // keyword token in C modes. In our case, if it was a keyword, we wouldn't
+  // have gotten the identifier for it anyway, so that function will always
+  // return false for us. Instead, check against the token definitions directly.
+  //
+  // FIXME: It would be nice to handle things like char8_t and wchar_t here as
+  // well, but those require looking at the currently selected language options
+  // which would make this interface confusing with isCPlusPlusKeyword.
+  switch (getNameKwStatus(LangOpts, getName())) {
+  case KS_Enabled:
+  case KS_Extension:
+    return true;
+  default:
+    return false;
+  }
+}
+
 /// Returns true if the identifier represents a C++ keyword in the
 /// specified language.
 bool IdentifierInfo::isCPlusPlusKeyword(const LangOptions &LangOpts) const {
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index d28a2107d58a9..74432ea34bd57 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -61,6 +61,7 @@
 #include "llvm/ADT/STLForwardCompat.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringSwitch.h"
 #include "llvm/Support/SaveAndRestore.h"
 #include "llvm/TargetParser/Triple.h"
 #include <algorithm>
@@ -6107,6 +6108,23 @@ static bool isFromSystemHeader(SourceManager &SM, const Decl *D) {
          SM.isInSystemMacro(D->getLocation());
 }
 
+static bool isKeywordInCPlusPlus(const Sema &S, const IdentifierInfo* II) {
+  if (!II)
+    return false;
+
+  // Clear all the C language options, set all the C++ language options, but
+  // otherwise leave all the defaults alone. This isn't ideal because some
+  // feature flags are language-specific, but this is a reasonable
+  // approximation.
+  LangOptions LO = S.getLangOpts();
+  LO.CPlusPlus = LO.CPlusPlus11 = LO.CPlusPlus14 = LO.CPlusPlus17 =
+      LO.CPlusPlus20 = LO.CPlusPlus23 = LO.CPlusPlus26 = 1;
+  LO.C99 = LO.C11 = LO.C17 = LO.C23 = LO.C2y = 0;
+  LO.Char8 = 1; // Presume this is always a keyword in C++.
+  LO.WChar = 1; // Presume this is always a keyword in C++.
+  return II->isNameKeyword(LO);
+}
+
 void Sema::warnOnReservedIdentifier(const NamedDecl *D) {
   // Avoid warning twice on the same identifier, and don't warn on redeclaration
   // of system decl.
@@ -6118,6 +6136,10 @@ void Sema::warnOnReservedIdentifier(const NamedDecl *D) {
     Diag(D->getLocation(), diag::warn_reserved_extern_symbol)
         << D << static_cast<int>(Status);
   }
+  // Diagnose use of C++ keywords in C as being incompatible with C++.
+  if (!getLangOpts().CPlusPlus &&
+      isKeywordInCPlusPlus(*this, D->getIdentifier()))
+    Diag(D->getLocation(), diag::warn_identifier_is_cpp_keyword) << D;
 }
 
 Decl *Sema::ActOnDeclarator(Scope *S, Declarator &D) {
@@ -10416,6 +10438,15 @@ Sema::ActOnFunctionDeclarator(Scope *S, Declarator &D, DeclContext *DC,
   // Finally, we know we have the right number of parameters, install them.
   NewFD->setParams(Params);
 
+  // If this declarator is a declaration and not a definition, its parameters
+  // will not be pushed onto a scope chain. That means we will not issue any
+  // reserved identifier warnings for the declaration, but we will for the
+  // definition. Handle those here.
+  if (!Params.empty() && !D.isFunctionDefinition()) {
+    for (const ParmVarDecl *PVD : Params)
+      warnOnReservedIdentifier(PVD);
+  }
+
   if (D.getDeclSpec().isNoreturnSpecified())
     NewFD->addAttr(
         C11NoReturnAttr::Create(Context, D.getDeclSpec().getNoreturnSpecLoc()));
diff --git a/clang/test/Sema/c++-keyword-in-c.c b/clang/test/Sema/c++-keyword-in-c.c
new file mode 100644
index 0000000000000..364f115ddcb44
--- /dev/null
+++ b/clang/test/Sema/c++-keyword-in-c.c
@@ -0,0 +1,213 @@
+// RUN: %clang_cc1 -fsyntax-only -verify -Widentifier-is-c++-keyword %s
+// RUN: %clang_cc1 -fsyntax-only -verify -Wc++-compat %s
+// RUN: %clang_cc1 -fsyntax-only -verify=good %s
+// RUN: %clang_cc1 -fsyntax-only -verify=cxx -x c++ -std=c++2c %s
+// good-no-diagnostics
+
+// 'try', 'this' and 'throw' are not tested as identifiers, but are instead
+// tested as other constructs (otherwise there would be redefinition errors in
+// C).
+int catch;              // expected-warning {{identifier 'catch' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int class;              // expected-warning {{identifier 'class' conflicts with a C++ keyword}} \
+                           cxx-error {{declaration of anonymous class must be a definition}} \
+                           cxx-warning {{declaration does not declare anything}}
+int const_cast;         // expected-warning {{identifier 'const_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int delete;             // expected-warning {{identifier 'delete' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int dynamic_cast;       // expected-warning {{identifier 'dynamic_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int explicit;           // expected-warning {{identifier 'explicit' conflicts with a C++ keyword}} \
+                           cxx-error {{'explicit' can only appear on non-static member functions}} \
+                           cxx-warning {{declaration does not declare anything}}
+int export;             // expected-warning {{identifier 'export' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int friend;             // expected-warning {{identifier 'friend' conflicts with a C++ keyword}} \
+                           cxx-error {{'friend' used outside of class}} \
+                           cxx-warning {{declaration does not declare anything}}
+int mutable;            // expected-warning {{identifier 'mutable' conflicts with a C++ keyword}} \
+                           cxx-warning {{declaration does not declare anything}}
+int namespace;          // expected-warning {{identifier 'namespace' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int new;                // expected-warning {{identifier 'new' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int operator;           // expected-warning {{identifier 'operator' conflicts with a C++ keyword}} \
+                           cxx-error {{expected a type}}
+int private;            // expected-warning {{identifier 'private' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int protected;          // expected-warning {{identifier 'protected' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int public;             // expected-warning {{identifier 'public' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int reinterpret_cast;   // expected-warning {{identifier 'reinterpret_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int static_cast;        // expected-warning {{identifier 'static_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int template;           // expected-warning {{identifier 'template' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int typename;           // expected-warning {{identifier 'typename' conflicts with a C++ keyword}} \
+                           cxx-error {{expected a qualified name after 'typename'}}
+int typeid;             // expected-warning {{identifier 'typeid' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int using;              // expected-warning {{identifier 'using' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int virtual;            // expected-warning {{identifier 'virtual' conflicts with a C++ keyword}} \
+                           cxx-error {{'virtual' can only appear on non-static member functions}} \
+                           cxx-warning {{declaration does not declare anything}}
+int wchar_t;            // expected-warning {{identifier 'wchar_t' conflicts with a C++ keyword}} \
+                           cxx-error {{cannot combine with previous 'int' declaration specifier}} \
+                           cxx-warning {{declaration does not declare anything}}
+int char8_t;            // expected-warning {{identifier 'char8_t' conflicts with a C++ keyword}} \
+                           cxx-error {{cannot combine with previous 'int' declaration specifier}} \
+                           cxx-warning {{declaration does not declare anything}}
+int char16_t;           // expected-warning {{identifier 'char16_t' conflicts with a C++ keyword}} \
+                           cxx-error {{cannot combine with previous 'int' declaration specifier}} \
+                           cxx-warning {{declaration does not declare anything}}
+int char32_t;           // expected-warning {{identifier 'char32_t' conflicts with a C++ keyword}} \
+                           cxx-error {{cannot combine with previous 'int' declaration specifier}} \
+                           cxx-warning {{declaration does not declare anything}}
+int noexcept;           // expected-warning {{identifier 'noexcept' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int co_await;           // expected-warning {{identifier 'co_await' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int co_return;          // expected-warning {{identifier 'co_return' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int co_yield;           // expected-warning {{identifier 'co_yield' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int consteval;          // expected-warning {{identifier 'consteval' conflicts with a C++ keyword}} \
+                           cxx-error {{consteval can only be used in function declarations}}
+int constinit;          // expected-warning {{identifier 'constinit' conflicts with a C++ keyword}} \
+                           cxx-error {{constinit can only be used in variable declarations}}
+int concept;            // expected-warning {{identifier 'concept' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+int requires;           // expected-warning {{identifier 'requires' conflicts with a C++ keyword}} \
+                           cxx-error {{expected unqualified-id}}
+
+// Now try the same thing, but as struct members.
+struct S {
+  int catch;            // expected-warning {{identifier 'catch' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int class;            // expected-warning {{identifier 'class' conflicts with a C++ keyword}} \
+                           cxx-error {{declaration of anonymous class must be a definition}} \
+                           cxx-warning {{declaration does not declare anything}}
+  int const_cast;       // expected-warning {{identifier 'const_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int delete;           // expected-warning {{identifier 'delete' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int dynamic_cast;     // expected-warning {{identifier 'dynamic_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int explicit;         // expected-warning {{identifier 'explicit' conflicts with a C++ keyword}} \
+                           cxx-error {{'explicit' can only appear on non-static member functions}} \
+                           cxx-warning {{declaration does not declare anything}}
+  int export;           // expected-warning {{identifier 'export' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int friend;           // expected-warning {{identifier 'friend' conflicts with a C++ keyword}} \
+                           cxx-error {{'friend' must appear first in a non-function declaration}}
+  int mutable;          // expected-warning {{identifier 'mutable' conflicts with a C++ keyword}} \
+                           cxx-warning {{declaration does not declare anything}}
+  int namespace;        // expected-warning {{identifier 'namespace' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int new;              // expected-warning {{identifier 'new' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int operator;         // expected-warning {{identifier 'operator' conflicts with a C++ keyword}} \
+                           cxx-error {{expected a type}}
+  int private;          // expected-warning {{identifier 'private' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int protected;        // expected-warning {{identifier 'protected' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int public;           // expected-warning {{identifier 'public' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int reinterpret_cast; // expected-warning {{identifier 'reinterpret_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int static_cast;      // expected-warning {{identifier 'static_cast' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int template;         // expected-warning {{identifier 'template' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int this;             // expected-warning {{identifier 'this' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int throw;            // expected-warning {{identifier 'throw' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int try;              // expected-warning {{identifier 'try' conflicts with a C++ keyword}} \
+                           cxx-error {{expected member name or ';' after declaration specifiers}}
+  int typename;         // expected-warning {{identifier 'typename' conflicts with a C++ keyword}} \
+                           cxx-error {{exp...
[truncated]

@AaronBallman
Copy link
Collaborator Author

Note, I'm running this one through llvm-compile-time-tracker because of the lookup happening on every identifier. I'm not convinced that the "is this identifier a keyword in other language modes" code has good performance and wanted to double-check.

Copy link

github-actions bot commented Apr 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@philnik777
Copy link
Contributor

Maybe -Wkeyword-in-c++ or -Wc++-keyword would be a more concise name for the group?

@AaronBallman
Copy link
Collaborator Author

Maybe -Wkeyword-in-c++ or -Wc++-keyword would be a more concise name for the group?

I'm not tied to the name I picked, so either of these is fine by me. GCC doesn't split this into its own warning group, so we've got some latitude.

Any strong preferences?

@philnik777
Copy link
Contributor

Maybe -Wkeyword-in-c++ or -Wc++-keyword would be a more concise name for the group?

I'm not tied to the name I picked, so either of these is fine by me. GCC doesn't split this into its own warning group, so we've got some latitude.

Any strong preferences?

FWIW I feel like the current one feels quite artificial. I think -Wc++-keyword is kinda nice, since it feels very similar to -Wc++-compat, but I don't have strong preferences.

@Sirraide
Copy link
Member

I think -Wc++-keyword is kinda nice

+1

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about and, or, and friends (which I think are only keywords in C if you include a certain header irrc).

@AaronBallman
Copy link
Collaborator Author

Alas, I think this PR may be a non-starter due to performance: https://llvm-compile-time-tracker.com/?config=Overview&stat=wall-time&remote=AaronBallman:

C 		56a3f3cd28 	8.04s (+2.96%) 	8.23s (+3.32%) 	11.87s (+1.92%) 	2.83s (+10.21%) 	8.50s (+2.99%) 	3.22s (+7.82%) 	7.12s (+1.63%) 	2.43s (+5.91%) 	580.64s (+0.44%)

+10% compile time overhead is painful. I believe the issue is with needing to look at token names (doing a string compare in a loop across a pile of tokens on every identifier declares). But the token kind we have is identifier and not one of the keywords, so we can't do the more trivial integer comparisons. Even if we could, I suspect this would have a performance hit just due to the number of identifiers.

I'm going to try checking whether the diagnostic is enabled before doing the comparisons to see if that improves things. The diagnostic is off by default and only enabled under -Wc++-compat, so some amount of performance hit is reasonable given the need to avoid C++ keywords in C code.

@AaronBallman
Copy link
Collaborator Author

What about and, or, and friends (which I think are only keywords in C if you include a certain header irrc).

Good call, I'll add those.

@Sirraide
Copy link
Member

But the token kind we have is identifier and not one of the keywords, so we can't do the more trivial integer comparisons.

Can we precompute a list/map/whatever of all identifiers that are keywords in C++ but not in C and then do pointer comparisons on the IdentifierInfo* maybe? At least then we wouldn’t have to do actual string comparisons; that said, I’m not sure that’s going to be that much better...

@AaronBallman
Copy link
Collaborator Author

But the token kind we have is identifier and not one of the keywords, so we can't do the more trivial integer comparisons.

Can we precompute a list/map/whatever of all identifiers that are keywords in C++ but not in C and then do pointer comparisons on the IdentifierInfo* maybe? At least then we wouldn’t have to do actual string comparisons; that said, I’m not sure that’s going to be that much better...

That's a pretty good idea, let me try that. I'll back out the changes that disable checking when the diagnostic isn't enabled so we get a better comparison.

I want to see if I can improve the speed before we conditionally enable
the check.
If this ends up performing significantly better, I'll rip out the
changes to IdentifierTable.
@AaronBallman
Copy link
Collaborator Author

But the token kind we have is identifier and not one of the keywords, so we can't do the more trivial integer comparisons.

Can we precompute a list/map/whatever of all identifiers that are keywords in C++ but not in C and then do pointer comparisons on the IdentifierInfo* maybe? At least then we wouldn’t have to do actual string comparisons; that said, I’m not sure that’s going to be that much better...

That's a pretty good idea, let me try that. I'll back out the changes that disable checking when the diagnostic isn't enabled so we get a better comparison.

That idea clawed back most of the performance:

Initial slow-down: C 		56a3f3cd28 	8.04s (+2.96%) 	8.23s (+3.32%) 	11.87s (+1.92%) 	2.83s (+10.21%) 	8.50s (+2.99%) 	3.22s (+7.82%) 	7.12s (+1.63%) 	2.43s (+5.91%) 	580.64s (+0.44%)
New approach: C 		07e44e05f8 	7.80s (-2.85%) 	8.03s (-2.86%) 	11.63s (-1.57%) 	2.57s (-8.93%) 	8.25s (-2.60%) 	2.97s (-7.09%) 	7.06s (-1.02%) 	2.33s (-3.10%) 	580.01s (+0.01%)

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry! I made these comments a while ago, but apparently never submitted them.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the Params.empty if you'd like, but this LGTM anyway.

* Switch to using a binary_search
* Remove an unnecessary check for empty()
@AaronBallman
Copy link
Collaborator Author

After talking with Erich offline, I've updated the implementation to use a binary_search. The pointers are being cast to uintptr_t so that the llvm::sort() isn't operating on unrelated pointers.

This one uses std::array so the size is exact.
if (AddResult == KS_Disabled) return;
// Don't add this keyword if disabled in this language and isn't otherwise
// special.
if (AddResult == KS_Disabled) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional:

It'd be more consistent with the existing code to add an extra KeywordStatus value for this case, analogous to KS_Future. To do that, maybe we'd need to also replace the special macro for C++ operator keywords with a normal flag so that it can be taken into account when computing the KeywordStatus too.

I don't think it's necessary to be consistent here, though; we can always refactor later.

@AaronBallman AaronBallman requested a review from rjmccall April 29, 2025 18:47
@AaronBallman
Copy link
Collaborator Author

Adding @rjmccall because this touches on Objective-C @ keyword handling. (Note, maybe we want -Wc++-compat to be disabled in Objective-C entirely and we should have -Wobjc++-compat instead? [foo bar]; is not valid C++, but I doubt anyone wants that diagnosed in Objective-C code, so it's a bit unclear what the diagnostic group means for ObjC.)

@rjmccall
Copy link
Contributor

The ObjC @ is essentially an escape into a completely different grammar, and it doesn't matter whether the following identifier is a keyword or not in the base language. This warning should never kick in on that identifier.

Similarly, ObjC selector components exist outside of the normal keyword rules, and the warning should never kick in on them.

Otherwise, this warning would be very useful in baseline Objective-C as a way of discovering names that would be keywords in Objective-C++, mostly in the incorporated C subset but also for ObjC features that ultimately end up in the C namespace and therefore are subject to the normal keyword rules, like method parameters and class names.

@AaronBallman
Copy link
Collaborator Author

The ObjC @ is essentially an escape into a completely different grammar, and it doesn't matter whether the following identifier is a keyword or not in the base language. This warning should never kick in on that identifier.

Okay, I've done that.

Similarly, ObjC selector components exist outside of the normal keyword rules, and the warning should never kick in on them.

I don't know about selectors all that much; can you give me a test case that you think I should handle?

Otherwise, this warning would be very useful in baseline Objective-C as a way of discovering names that would be keywords in Objective-C++, mostly in the incorporated C subset but also for ObjC features that ultimately end up in the C namespace and therefore are subject to the normal keyword rules, like method parameters and class names.

Okay, it remains enabled in Objective-C, just silenced if the preceding token is @.

@rjmccall
Copy link
Contributor

Similarly, ObjC selector components exist outside of the normal keyword rules, and the warning should never kick in on them.

I don't know about selectors all that much; can you give me a test case that you think I should handle?

In general: look at ParseObjCSelectorPiece and how it generally accepts keywords in the selector. Actually, the list of keywords there seems out of date and notably doesn't include post-C++98 keywords like decltype. The logic is absolutely meant to encompass every keyword; note that it even includes basic C keywords like if and struct. You probably don't want to fix that, but I can ping someone at Apple to do it.

Anyway, selectors show up in three places in the ObjC grammar:

  • @selector expressions, e.g. @selector(virtual:typeid:bool:)
  • message send expressions, e.g. [obj virtual: 0 typeid: 0 bool: 0]
  • method declarations, e.g. `
  • declarations, e.g.
@interface Foo
- (void) virtual: (int) x typeid: (int) y bool: (int) z;
@end

But there are also some special-case parsing rules for class because it is an important property name for NSObject. For example, in Objective-C++, the member name following -> or . is allowed to be class despite it being a keyword.

@AaronBallman
Copy link
Collaborator Author

Similarly, ObjC selector components exist outside of the normal keyword rules, and the warning should never kick in on them.

I don't know about selectors all that much; can you give me a test case that you think I should handle?

In general: look at ParseObjCSelectorPiece and how it generally accepts keywords in the selector. Actually, the list of keywords there seems out of date and notably doesn't include post-C++98 keywords like decltype. The logic is absolutely meant to encompass every keyword; note that it even includes basic C keywords like if and struct. You probably don't want to fix that, but I can ping someone at Apple to do it.

Anyway, selectors show up in three places in the ObjC grammar:

* `@selector` expressions, e.g. `@selector(virtual:typeid:bool:)`

* message send expressions, e.g. `[obj virtual: 0 typeid: 0 bool: 0]`

* method declarations, e.g. `

* declarations, e.g.
@interface Foo
- (void) virtual: (int) x typeid: (int) y bool: (int) z;
@end

But there are also some special-case parsing rules for class because it is an important property name for NSObject. For example, in Objective-C++, the member name following -> or . is allowed to be class despite it being a keyword.

Ooof, it seems like there may not be a good way of handling this with the approach recommended by @zygoloid where we diagnose from HandleIdentifier() in the preprocessor. That requires far too much looking around to try to identify whether we're in the middle of a selector or not. The best I could think of would be to use some sort of "don't complain about keyword compatibility, we're doing " RAII object to set a flag on the preprocessor so we can read it in HandleIdentifier. If we go that route, I could also stop looking to see if the preceding token is @ from HandleIdentifier() and just use that mode. But is this too much of a hack?

@rjmccall
Copy link
Contributor

rjmccall commented Apr 30, 2025

Yeah, triggering this at the preprocessor level is not going to let you do it properly in Objective-C, where keywords are effectively context-sensitive. If that's the only reasonable way to implement it, you should just turn it off in Objective-C completely.

@zygoloid
Copy link
Collaborator

zygoloid commented Apr 30, 2025

For what it's worth, the existing "keyword from language you're not using" warnings already fire for analogous Objective-C++ cases... which in those cases is correct, since Clang does not accept C++11-onwards keywords as selector names in the corresponding versions of Objective-C++ (but presumably should -- I wonder if we should just check whether the token has an identifier spelling and not care what kind it is, like we do for attribute names?).

I wonder if it'd suffice to do a single-token lookahead / lookbehind in Objective-C / Objective-C++ and disable all of these warnings if the previous token is @ or the next token is :. That would have false negatives for some weird cases like class co_await : base {}; under -Wc++20-compat in Objective-C++17 or earlier, but that might be better than losing the warning entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

diagnostics missing from -Wc++-compat
9 participants