Investigate making JSDoc parsing lazier and more optimal #52959

DanielRosenwasser · 2023-02-25T00:15:57Z

Not parsing JSDoc seems to unlock significant parsing wins.

Surprisingly, we parse JSDoc unconditionally, even in .ts files for batch compilation.

So there are a few things I want to investigate here:

Can we optimize JSDoc parsing itself? Why is JSDoc parsing so slow in the first place? Even if we decided to always parse JSDoc comments, it should be fast because full time-to-interactivity in the language service is bottle-necked on parse time.

My naive theory is that JSDoc scanning today is "chatty", returning uninteresting tokens, making every request for the next token unnecessarily slow. But of course, this is without benchmarking. I would love to understand what exactly is the issue here.
If it's not strictly required, can we make JSDoc parse in a lazier or optional fashion? It seems unavoidable in JavaScript files; but maybe in TypeScript files, JSDoc is only attempted to be attached in cases where we detect that trivia intersects with tag names we're interested in (e.g. @deprecated, @see, @link, maybe more). Maybe we only do this for the language service.

The text was updated successfully, but these errors were encountered:

DanielRosenwasser · 2023-02-27T01:53:02Z

I tried a fun benchmark here. I made a TypeScript file that contains 3 functions (foo, bar, and baz), each prefixed by JSDoc comments containing the contents of War and Peace.

/** CHAPTER I */

/** "Well, Prince, so Genoa and Lucca are now just family estates of the */
/** Buonapartes. But I warn you, if you don't tell me that this means war, */
/** if you still try to defend the infamies and horrors perpetrated by that */
/** Antichrist--I really believe he is Antichrist--I will have nothing more */
/** to do with you and you are no longer my friend, no longer my 'faithful */
/** slave,' as you call yourself! But how do you do? I see I have frightened */
/** you--sit down and tell me all the news." */
// ...
function foo() {
    console.log("lol");
}

I then replaced all the characters in those comments with spaces.

/**           */

/**                                                                      */
/**                                                                        */
/**                                                                         */
/**                                                                         */
/**                                                                        */
/**                                                                          */
/**                                          */
// ...
function foo() {
    console.log("lol");
}

And because one could make the argument that whitespace doesn't need to be retained so this is an unfair benchmark, I made a file where all the contents are simply replaced with As.

/** AAAAAAAAA */

/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
// ...
function foo() {
    console.log("lol");
}

Of course, that many JSDoc comments is very uncommon. The first thing I actually tried was all the contents of War and Peace in a single comment above each function.

/**
 * CHAPTER I
 * 
 * "Well, Prince, so Genoa and Lucca are now just family estates of the
 * Buonapartes. But I warn you, if you don't tell me that this means war,
 * if you still try to defend the infamies and horrors perpetrated by that
 * Antichrist--I really believe he is Antichrist--I will have nothing more
 * to do with you and you are no longer my friend, no longer my 'faithful
 * slave,' as you call yourself! But how do you do? I see I have frightened
 * you--sit down and tell me all the news."
 *
/// ...
 */
function foo() {
    console.log("lol");
}

And finally, the AAAAAA version of that.

/**
 * AAAAAAAAA
 * 
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
/// ...
 */
function foo() {
    console.log("lol");
}

So here's the output of running these, though I'll limit the summary to the parse time

$ node ../built/local/tsc.js -p tsconfig-split.json
Parse time:                 0.95s

$ node ../built/local/tsc.js -p tsconfig-split-space.json
Parse time:                 0.47s

$ node ../built/local/tsc.js -p tsconfig-split-AAA.json
Parse time:                 0.53s

$ node ../built/local/tsc.js -p tsconfig-single.json
Parse time:                 0.99s

$ node ../built/local/tsc.js -p tsconfig-single-AAA.json
Parse time:                 0.32s

$ node ../built/local/tsc.js -p tsconfig-split.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-split.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191546
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               88895K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.04s
Parse time:                 0.95s
Program time:               0.99s
Bind time:                  0.02s
Check time:                 0.02s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 1.04s

$ node ../built/local/tsc.js -p tsconfig-split-space.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-split-space.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191548
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               71089K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.01s
Parse time:                 0.47s
Program time:               0.48s
Bind time:                  0.02s
Check time:                 0.02s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 0.52s

$ node ../built/local/tsc.js -p tsconfig-split-AAA.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-split-AAA.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191546
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               85167K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.01s
Parse time:                 0.53s
Program time:               0.54s
Bind time:                  0.02s
Check time:                 0.02s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 0.58s

$ node ../built/local/tsc.js -p tsconfig-single.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-single.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191550
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    6
Symbols:                        9
Types:                         66
Instantiations:                 0
Memory used:               57249K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.04s
Parse time:                 0.99s
Program time:               1.03s
Bind time:                  0.00s
Check time:                 0.01s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 1.04s

$ node ../built/local/tsc.js -p tsconfig-single-AAA.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-single-AAA.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191554
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               37265K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.01s
Parse time:                 0.32s
Program time:               0.33s
Bind time:                  0.00s
Check time:                 0.01s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 0.34s

So this is what I mean by the parser being too "chatty" with the scanner. There's probably half a second of time being spent ping-ponging between them for every identifier.

One idea I would suggest is trying to proactively grab chunks of text from the comment whenever "normalization" doesn't need to occur. This might occur when whitespace between words is only a single space character, possibly no backticks, etc. - basically, whenever the parser handles these conditions, maybe the scanner would instead. This is the same thing we do on string literals, where we can eagerly grab chunks of data. For the common case, my hope is that this would approximate the results of the "AAAAAA" example.

bradzacher · 2023-02-27T20:19:14Z

From what I've seen this is a one source of wasted time for @typescript-eslint parsing. It's rare that anyone actually consumes JSDoc info in lint rules so having it computed up-front is just wasted effort.

So I would be solidly in favour of lazy-by-default or lazy-behind-option.

DanielRosenwasser · 2023-02-27T20:27:10Z

Yeah, I think the right things to do are

Make it efficient.
Make it opt-out if necessary.

sandersn · 2023-02-27T21:53:17Z

I ran into the need for this from the other direction: trying to improve just the jsdoc parser. I noticed that the jsdoc scanner provides way more granular tokens than the jsdoc parser needs, even though it's (basically) the only caller of the jsdoc scanner.

I put up a draft PR of my work so far at #53007

bradzacher · 2023-02-27T22:28:26Z

Adding a CPU profile onto this just to show a more real-world example of the impact to @typescript-eslint

This profile was taken from a lint run on the @typescript-eslint codebase

$ git clone git@github.com:typescript-eslint/typescript-eslint.git
$ cd typescript-eslint
$ yarn
$ node --cpu-prof ./node_modules/.bin/eslint .

TS_ESLINT_PARSE.cpuprofile.zip (extract and open this in something like https://speedscope.app)

We can see that across the entire run the parseJSDoc function consumes a total of ~1.63s across the run, which equates to ~3.8% of the total lint run!! This is huge, considering that we also lint with types in the repo, which ofc adds quite a bit of overhead. For a non-type-aware lint run you'd find the %age would be much, much higher.

sandersn · 2023-03-07T16:30:40Z

Here's a summary of five things I tried:

Save jsdoc comments in string, not array #53048 -- not faster, not slower, uses a little more memory, code is a little clearer.
Lower jsdoc comment state #53065 -- no difference really, this was a simple one to try
Scan bigger/fewer jsdoc tokens #53081 -- parses tokens big improvements in about half of repos, some slowdown on compiler[-unions] on some node versions.
Followup to the previous: scan the leading * as a single whitespace token -- slightly slower for all projects.
Don't slice whitespace in jsdoc parser unless needed #53121 -- slightly slower for all projects.

The last two are surprising to me, since there's a constant overhead per line for yielding and slicing the whitespace/asterisk strings. It shouldn't be as large as the overhead of yielding individual tokens for comment text though. The slight increases in the last two PRs makes me think that the first big-token PR may also increase time in the parser, but offset by the larger gains in the scanner.

jakebailey · 2023-09-13T23:04:52Z

#52921 is now merged, so tsc will now be skipping JSDoc parsing when able. #55739 is a PR which opens the door to that being used in the API with a bit more configurability.

sandersn · 2023-09-19T21:29:22Z

My latest profiles of xstate (the latest commit in their next branch) and angular 2 (the ancient version we perf test on) don't show much time in JSDoc, even with JSDoc parsing forced on. I think we can close this bug, at least for now.

DanielRosenwasser added Experiment A fork with an experimental idea which might not make it into master Domain: Performance Reports of unusually slow behavior labels Feb 25, 2023

DanielRosenwasser assigned sandersn and jakebailey Feb 25, 2023

DanielRosenwasser added this to the TypeScript 5.1.0 milestone Feb 25, 2023

DanielRosenwasser mentioned this issue Feb 28, 2023

TypeScript 5.1 Iteration Plan #53031

Closed

This was referenced Mar 1, 2023

Save jsdoc comments in string, not array #53048

Closed

Lower jsdoc comment state #53065

Closed

Scan bigger/fewer jsdoc tokens #53081

Merged

Don't slice whitespace in jsdoc parser unless needed #53121

Closed

sandersn mentioned this issue Mar 7, 2023

Destructure scanner methods in parser #53144

Closed

jakebailey mentioned this issue Mar 18, 2023

Skip parsing JSDoc when not needed #52921

Merged

DanielRosenwasser mentioned this issue May 17, 2023

TypeScript 5.2 Iteration Plan #54298

Closed

bradzacher mentioned this issue Aug 13, 2023

Enable eslint rule @typescript-eslint/no-unused-vars, disable compiler rule #55139

Closed

DanielRosenwasser mentioned this issue Aug 23, 2023

TypeScript 5.3 Iteration Plan #55486

Open

jakebailey mentioned this issue Sep 13, 2023

Make JSDoc skipping public, plus more configurable #55739

Merged

jakebailey closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate making JSDoc parsing lazier and more optimal #52959

Investigate making JSDoc parsing lazier and more optimal #52959

DanielRosenwasser commented Feb 25, 2023

DanielRosenwasser commented Feb 27, 2023

bradzacher commented Feb 27, 2023 •

edited

Loading

DanielRosenwasser commented Feb 27, 2023

sandersn commented Feb 27, 2023

bradzacher commented Feb 27, 2023

sandersn commented Mar 7, 2023

jakebailey commented Sep 13, 2023

sandersn commented Sep 19, 2023

Investigate making JSDoc parsing lazier and more optimal #52959

Investigate making JSDoc parsing lazier and more optimal #52959

Comments

DanielRosenwasser commented Feb 25, 2023

DanielRosenwasser commented Feb 27, 2023

bradzacher commented Feb 27, 2023 • edited Loading

DanielRosenwasser commented Feb 27, 2023

sandersn commented Feb 27, 2023

bradzacher commented Feb 27, 2023

sandersn commented Mar 7, 2023

jakebailey commented Sep 13, 2023

sandersn commented Sep 19, 2023

bradzacher commented Feb 27, 2023 •

edited

Loading