-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser: fix parsing of anonymous entities with clang 16+ #190
Conversation
This is not enough. |
Not enough, but it does make progress. It fixes all actually anonymous cases, where clang now returns the "helpful" What remains are typedefs of anonymous structs/unions/enums, where the typedef name is now returned as the struct/union/enum name, and This might actually be a useful thing... except would have to make it work on pre-16 clang too. |
Starting from libclang 16, typedefs of anonymous entities such as typedef struct { /* ... */ } foo; will no longer have an empty name for cursor.spelling. Instead, the returned name will be cursor.type.spelling, matching the name of the typedef, i.e. "foo" in the above case. cursor.is_anonymous() will also be false for such entities, regardless of libclang version. Rather than try to coerce libclang 16 to the @anonymous_<sha1> naming convention we have, manually use the same naming for libclang 15 and earlier. It's arguably a more useful name anyway. We first have to distinguish regular anonymous entities using cursor.is_anonymous(). For libclang 15 and earlier, to distinguish from typedef of anonyous entities. For libclang 16 and later, the name will be of the form "unnamed ... at /path/to/file".
Updated with a proper fix across clang versions. |
So it turns out this was a big part of why my tests were broken too, and it does improve things substantially (I'm on v16). That said, I still have plenty of other test issues. I still haven't gone through it all, but doesn't seem related to anonymous stuff, so I don't think they matter here, but, for the record, most errors seem to come from Anyway, regarding these changes, the code looks fine. I'm not sure the new behaviour is better than the edit: I'll investigate the other errors I'm getting later, I just wanted you to be aware. I think no one else is complaining so just ignore it for now. edit2: Ok, the bool thing seems to have been uncovered already, it's #193 for sure. I still have other problems though with C++ access specifiers at least... Anyway, need to look at it later with more time. |
Or replicate the new behaviour across all versions... That would be even better I guess, though a bit more difficult perhaps. |
That's what the commit is supposed to do, though!
After that, what remains are the typedeffed anonymous entities. Before clang 16, they also had empty string for |
Also, it seems actually harder to unify on the old behaviour. With old clang, you can tell it's a typedeffed anonymous entity from the empty |
Sorry, I meant printing for all versions (e.g.): c:enum:: enum (unnamed at /builddir/build/BUILD/hawkmoth-0.15.0/test/c/enum.c:18:1) instead of c:enum:: @anonymous_ef849cb791c3c921354e4b05dcefedfa |
I'm a bit rusty on this topic relative to you, but there's the 'unnamed at ...' somewhere, right? |
Ah wait, I think I get it. Never mind me, I'm doing too many things at once... I'll have a closer look later. |
# empty string. Match the behaviour across libclang versions. | ||
if cursor.spelling == '': | ||
return cursor.type.spelling | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is indeed tricky. The only way I can see around it is to look into the tokens, which is not great. At the same time I'm not happy with the proposed solution where the typedefs are erroneously documented as structures. It's unexpected and may lead to confusion...
With a quick test (v16), this should work as an alternative with no changes to the unit test results:
# Short cut for anonymous symbols. (Not sure we need both conditions)
if cursor.is_anonymous() or cursor.spelling == '':
return None
# libclang 16 and later have cursor.spelling == cursor.type.spelling for
# typedefs of anonymous entities. Confirm that it's an anonymous
# declaration by ensuring the 2nd token is a '{', e.g. 'struct {...'.
tokens = cursor.get_tokens()
next(tokens)
if next(tokens).spelling == '{':
return None
Not sure how brittle it is (it's quite a shallow approach), but it passes all tests so far.
Regardless of whether the token parsing needs to be more robust, we need to decide whether the output consistency is worth this sort of thing. I'm leaning yes: it solves a real issue and we have several precedents for looking into the tokens when the AST doesn't give us a digested answer by itself. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the same time I'm not happy with the proposed solution where the typedefs are erroneously documented as structures. It's unexpected and may lead to confusion...
As a practical example, when converting Mesa to use Hawkmoth, I've had to give names to a bunch of typedeffed anonymous structs just because of documentation. It's not great, because it's a common pattern, and converting typedef struct { ... } foo
to typedef struct foo { ... } foo
feels silly. They are both typedefs, yet they both produce struct documentation, and you still can't reference the typedef in Sphinx in any way. Unless you separate the named struct and the typedef and document them separately.
I think the expectation actually is typedef struct { ... } foo
documents a struct and lets you reference it as foo
.
As a data point, Doxygen+Breathe also does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a practical example, when converting Mesa to use Hawkmoth, I've had to give names to a bunch of typedeffed anonymous structs just because of documentation.
Hmm, that was a conscious decision we took a while back and I still believe it was the right call. There is a difference:
struct some_name {
...
};
/** This will be documented with `c:type`, not `c:struct`. */
typedef struct some_name new_name;
In the proposed solution, we document something as a struct which isn't. E.g., taking the example from the unit test, one cannot do struct typedef_struct var;
but can certainly do struct named var;
, yet the documentation would hint otherwise.
I agree it's a common pattern, but the real solution there would be to make the parser special case these scenarios and use c:type
instead. I think that's a bit trickier though and I was happy to leave that in the wish list as long as we had expectable and simple behaviour. In fact we did try to do it the smart way before and it was the ensuing discussion over edge cases that led to the decision of sticking to the current behaviour. At the time we actually had opposite roles in the discussion if I remember correctly, but you thoroughly convinced me 😅
As for Doxygen, does it special case it as a type documentation or does it document the typedef as a struct? Just curiosity... I wouldn't really condition any decision here on what Doxygen does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doxygen+Breathe documents the typedeffed structs as structs, not as types, regardless of whether it's anonymous or not. But hey, so do we! The question here is, what name to give that.
If we were to document such things as types, the user would be forced to separate the struct and typedef both in code and documentation, because you can't have member documentation within a type documentation.
Arguably defaulting to the typedef name is more flexible, because you can use that as the easy default, but you can also separate them and document the struct and type separately, but you're not forced to do that for the common case.
That's not required for Doxygen+Breathe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time we actually had opposite roles in the discussion if I remember correctly, but you thoroughly convinced me
Heh. In my defense, you've fixed the type fixups and it's all much clearer now. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to document such things as types, the user would be forced to separate the struct and typedef both in code and documentation, because you can't have member documentation within a type documentation
Damn, that's a good point... Have you tried it though? Sphinx seems to eat this up just fine and renders it beautifully as the typedef it is. Did I miss anything?
.. c:type:: @anonymous_hash some_struct
Some typedefed struct.
.. c:member:: int foo
Foo.
:c:type:`some_struct`
:c:member:`some_struct.foo`
Arguably defaulting to the typedef name is more flexible, because you can use that as the easy default, but you can also separate them and document the struct and type separately, but you're not forced to do that for the common case.
Very true, and I don't want to fight it too much. I said my piece, and this is certainly not about the code, which looks fine. I am not thoroughly convinced this time around though :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to have to let this one simmer for a bit anyway, as I'm taking some time off. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doxygen+Breathe documents the typedeffed structs as structs, not as types, regardless of whether it's anonymous or not. But hey, so do we! The question here is, what name to give that.
Forgot to reply to this, but for completeness, no we don't! We will do that once we merge this, but currently we document the structure being typedefed as a structure. A structure that happens to be anonymous and so gets an anonymous name.
What you did in Mesa is frankly a poor workaround, even if I get where you're coming from. You're naming a structure because hawkmoth can't contextually document it as a typedef... It works better sure, but you never actually document the typedef which is what you wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in case it wasn't clear, I'm conceding my position. I just wanted to note my arguments against as best I could.
So I decided to merge this in the end. This is what clang gives us as the name from v16 on, so let's roll with that. If folks want to document typedefs and structs separately, they still can by splitting their definitions. |
Clang 16 started returning names for anonymous entities [1], of the form:
(unnamed at /path/to/file::)
Additionally check for is_anonymous().
[1] llvm/llvm-project@19e984e
I'm not sure if this actually fixes anything, because I didn't try to install clang 16 yet. But at least this doesn't break anything on my clang 14.
Maybe a fix to #189.