-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate making JSDoc parsing lazier and more optimal #52959
Comments
I tried a fun benchmark here. I made a TypeScript file that contains 3 functions ( /** CHAPTER I */
/** "Well, Prince, so Genoa and Lucca are now just family estates of the */
/** Buonapartes. But I warn you, if you don't tell me that this means war, */
/** if you still try to defend the infamies and horrors perpetrated by that */
/** Antichrist--I really believe he is Antichrist--I will have nothing more */
/** to do with you and you are no longer my friend, no longer my 'faithful */
/** slave,' as you call yourself! But how do you do? I see I have frightened */
/** you--sit down and tell me all the news." */
// ...
function foo() {
console.log("lol");
} I then replaced all the characters in those comments with spaces. /** */
/** */
/** */
/** */
/** */
/** */
/** */
/** */
// ...
function foo() {
console.log("lol");
} And because one could make the argument that whitespace doesn't need to be retained so this is an unfair benchmark, I made a file where all the contents are simply replaced with /** AAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
// ...
function foo() {
console.log("lol");
} Of course, that many JSDoc comments is very uncommon. The first thing I actually tried was all the contents of War and Peace in a single comment above each function. /**
* CHAPTER I
*
* "Well, Prince, so Genoa and Lucca are now just family estates of the
* Buonapartes. But I warn you, if you don't tell me that this means war,
* if you still try to defend the infamies and horrors perpetrated by that
* Antichrist--I really believe he is Antichrist--I will have nothing more
* to do with you and you are no longer my friend, no longer my 'faithful
* slave,' as you call yourself! But how do you do? I see I have frightened
* you--sit down and tell me all the news."
*
/// ...
*/
function foo() {
console.log("lol");
} And finally, the AAAAAA version of that. /**
* AAAAAAAAA
*
* AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
* AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
* AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
* AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
* AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
* AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
* AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
/// ...
*/
function foo() {
console.log("lol");
} So here's the output of running these, though I'll limit the summary to the parse time
So this is what I mean by the parser being too "chatty" with the scanner. There's probably half a second of time being spent ping-ponging between them for every identifier. One idea I would suggest is trying to proactively grab chunks of text from the comment whenever "normalization" doesn't need to occur. This might occur when whitespace between words is only a single space character, possibly no backticks, etc. - basically, whenever the parser handles these conditions, maybe the scanner would instead. This is the same thing we do on string literals, where we can eagerly grab chunks of data. For the common case, my hope is that this would approximate the results of the "AAAAAA" example. |
From what I've seen this is a one source of wasted time for @typescript-eslint parsing. It's rare that anyone actually consumes JSDoc info in lint rules so having it computed up-front is just wasted effort. So I would be solidly in favour of lazy-by-default or lazy-behind-option. |
Yeah, I think the right things to do are
|
I ran into the need for this from the other direction: trying to improve just the jsdoc parser. I noticed that the jsdoc scanner provides way more granular tokens than the jsdoc parser needs, even though it's (basically) the only caller of the jsdoc scanner. I put up a draft PR of my work so far at #53007 |
Adding a CPU profile onto this just to show a more real-world example of the impact to @typescript-eslint This profile was taken from a lint run on the @typescript-eslint codebase $ git clone git@github.com:typescript-eslint/typescript-eslint.git
$ cd typescript-eslint
$ yarn
$ node --cpu-prof ./node_modules/.bin/eslint . TS_ESLINT_PARSE.cpuprofile.zip (extract and open this in something like https://speedscope.app) We can see that across the entire run the |
Here's a summary of five things I tried:
The last two are surprising to me, since there's a constant overhead per line for yielding and slicing the whitespace/asterisk strings. It shouldn't be as large as the overhead of yielding individual tokens for comment text though. The slight increases in the last two PRs makes me think that the first big-token PR may also increase time in the parser, but offset by the larger gains in the scanner. |
My latest profiles of xstate (the latest commit in their |
Not parsing JSDoc seems to unlock significant parsing wins.
Surprisingly, we parse JSDoc unconditionally, even in
.ts
files for batch compilation.So there are a few things I want to investigate here:
Can we optimize JSDoc parsing itself? Why is JSDoc parsing so slow in the first place? Even if we decided to always parse JSDoc comments, it should be fast because full time-to-interactivity in the language service is bottle-necked on parse time.
My naive theory is that JSDoc scanning today is "chatty", returning uninteresting tokens, making every request for the next token unnecessarily slow. But of course, this is without benchmarking. I would love to understand what exactly is the issue here.
If it's not strictly required, can we make JSDoc parse in a lazier or optional fashion? It seems unavoidable in JavaScript files; but maybe in TypeScript files, JSDoc is only attempted to be attached in cases where we detect that trivia intersects with tag names we're interested in (e.g.
@deprecated
,@see
,@link
, maybe more). Maybe we only do this for the language service.The text was updated successfully, but these errors were encountered: