Performance problems on large repos #13

stevearc · 2022-06-09T00:46:29Z

When I try to open the summary on a test file in a repo at work, nvim freezes for a very long time.
The issue is this line:
https://github.com/rcarriga/neotest/blob/aaf2107a0d639935032d60c49a5bdda2de26b4d6/lua/neotest/client/init.lua#L408
The find_files actually completes within ~2 seconds, but then running adapter.is_test_file 250k+ times takes several minutes (vim-test adapter). After that freeze, there is a second freeze while it tries to write all the events to the logfile (hasn't finished yet).

My personal opinion, test discovery should be definitely optional, and possibly configurable. My previous job had a monorepo that was so big it was only served as a virtual filesystem, so running any type of crawling operation would have terrible side effects. Making it configurable (e.g. only these dirs, only this depth) might be nice, but would probably make more sense per-adapter instead of globally. If adapters need to control the search path, their API would have to change from testing individual files that neotest gives them to calling back into neotest with whatever their search config is. IDK if that refactor is worth it, which is why I could go either way on making the search configurable.

Another direction could be to customize the search on a per-directory basis instead of per-adapter. That avoids the need for an adapter API refactor, and in general this kind of functionality would be incredibly useful. I often work with many different types of repos on the same machine, sometimes in the same vim instance (using :tcd and one tab per project), and these projects will sometimes use the same language but require different configurations to run tests. I'd love to be able to configure my test adapters on a per-project basis. A rough proposal, it could look like:

require('neotest').setup({
  adapters = ...,
  summary = ...,
  discovery = {
    enabled = true,
  },
  projects = {
    ["~/work/big"] = {
      adapters = ...,
      discovery = {
        enabled = true,
        dirs = {'tests/unit', 'frontend/tests'},
        depth = 2
      },
    },
    ["~/personal/project"] = {
      adapters = ...,
      discovery = {
        enabled = false,
      },
    },
  },
})

I am happy to submit a PR for any parts of this once we align on a solution

Unrelated question: I'll probably be making more proposals, requests, and questions. Is filing an issue the best way to start a discussion, or would you prefer some other channel?

The text was updated successfully, but these errors were encountered:

rcarriga · 2022-06-09T21:07:23Z

The find_files actually completes within ~2 seconds, but then running adapter.is_test_file 250k+ times takes several minutes (vim-test adapter). After that freeze, there is a second freeze while it tries to write all the events to the logfile (hasn't finished yet).

Yep the vim-test test detection is horrifically slow. It should however only trigger if an open buffer registers as a possible test file so if you're not actually using it for that repo I'd suggest disabling it for whatever filetype is causing it to run (this is just a workaround for this particular case)

On to the broader issue

If adapters need to control the search path, their API would have to change from testing individual files that neotest gives them to calling back into neotest with whatever their search config is. IDK if that refactor is worth it, which is why I could go either way on making the search configurable.

I actually originally wrote the adapter interface to work this way but switched to the current way to avoid running multiple crawls but now the initial detection of tests (project root detection, open buffers etc) is smarter.

Another direction could be to customize the search on a per-directory basis instead of per-adapter.

I can see this being a pretty popular request not for performance but as you say for the per-project runner/settings. I think it's inevitable something will need handle this. The proposed interface looks pretty good AFAICT, though the adapters I've written (and I imagine others) would need to be refactored to handle configuring multiple instances. Wouldn't be too much work though.

I think the project/discovery configuration would solve this issue better than having adapters controlling the search as it solves other issues too so for now I'll say let's go with that. This will be a rather large change so it'll take some time to work on. I think I'd like to take a stab at it myself just to get a grasp of the implementation details. I can provide more thoughts on interface as I put some more thought into it.

I'll probably be making more proposals, requests, and questions. Is filing an issue the best way to start a discussion, or would you prefer some other channel?

Love to hear that, thanks! To be honest I haven't thought about it much. I don't mind using issues right now but do feel it could get a bit noisy quickly depending on traffic. I've not used GitHub discussions much so I can take a look at that. Open to other suggestions as well! I'm planning on setting up a neotest org to keep all the repos under so I can add it to the admin TODOs 😅

stevearc · 2022-06-10T00:11:24Z

the adapters I've written (and I imagine others) would need to be refactored to handle configuring multiple instances

Oh this would be excellent! That might solve another issue I've been having with multiple directories open in the same vim. If both dirs use the same test adapter, I see "Common root not found" errors. But if each dir were using a different adapter instance that might be one way around it.

I think the project/discovery configuration would solve this issue better than having adapters controlling the search as it solves other issues too so for now I'll say let's go with that. This will be a rather large change so it'll take some time to work on.

Sounds good! LMK if I can help at any point.

rcarriga · 2022-06-10T08:11:00Z

I see "Common root not found" errors

As far as I remember this is actually a bit of laziness on my part because I don't open multiple projects in one NeoVim instance. The trees just need to be merged inside of a new parent here https://github.com/rcarriga/neotest/blob/master/lua/neotest/lib/positions/init.lua#L84. The way the new file detection works, it shouldn't (could be wrong) scan the new parent directories.

However I do prefer the idea of having separately maintained trees for each project as it'll present a lot cleaner IMO

rcarriga · 2022-08-28T15:18:51Z

Finally got around to implementing per-project configs 😅 I've got it in #100 just for testing but it seems to work nicely AFAICT. If anyone runs in to issues please let me know!

stevearc · 2022-08-28T18:28:09Z

Awesome!! I'll test it out when I get a chance and leave feedback

stevanmilic · 2022-09-05T18:22:19Z

I'm also having lags on bigger repos. This is especially annoying during BufAdd and BufLeave events, ~~disabling the test dicovery (#30) in config doesn't solve the issue~~ disabling test discovery does solve the issue 👍🏻

EDIT: I had wrong config setup 🤦🏻 all good.

rcarriga · 2022-09-07T08:32:09Z

Separate to the project config work, I've done some benchmarking and found an easy optimisation within the treesitter internal code. I've implemented it in the the latest master and I'm seeing a 30% reduction in time to parse the cpython repo 😁 It should hopefully improve performance for most languages. This won't solve all performance issues but it will hopefully improve things a bit.

I'm also planning on creating a consumer that can run benchmarks easily so others can see what is affecting performance

See #13

rcarriga · 2022-09-11T10:35:51Z

OK I've added two new config options to help with discovery performance.

Concurrent discovery control: You can now specify the number of concurrent workers parsing files. The default is CPU cores + 4 (this is a bit random, just went with what Python's ThreadPoolExecutor defaults to). Set this to 1 and it should reduce CPU usage considerably but it will also slow down test discovery.
Directory filtering: The proposed dirs option would probably suit most people but many projects put tests alongside corresponding code files so I thought this was a bit inflexible. Instead there's now a filter_dir option that can filter what directories to search for test files.

In terms of user experience, directory filtering is preferred over limiting concurrency which is preferred over disabling discovery altogether. I believe that these features are sufficient to tackle nearly all performance related issues. Would love to hear feedback on these! 😄

If you are still experiencing performance issues and you've disabled concurrency and set up filtering to at least rule out scanning directories like node_modules or venv please let me know but also be sure to rule out other plugins. I rarely experience significant slowdown due to neotest, it's almost always LSP (wish there was a better Python LSP 😅)

jfpedroza · 2022-09-12T02:01:59Z

I tried adding the filter_dir option, but it looks like it isn't being called. My project is set up like this:

require("neotest").setup_project(vim.loop.cwd(), {
  adapters = {
    require("neotest-elixir")({ extra_formatters = { "ExUnit.CLIFormatter", "ExUnitNotifier" } }),
  },
  discovery = {
    enabled = true,
    filter_dir = function(name, rel_path, root)
      print(name, rel_path, root)
      return false
    end,
  },
})

I see no messages and no directory is filtered.

Also, in Elixir all tests belong in a test/ folder, it isn't possible to put the test file alongside the source file, and because of the recent change in the find function, basically every project has to add the filter_dir option as Neotest is now detecting test files in the deps directory.

Would be possible to for an adapter to set a default filter_dir function?

See #13

rcarriga · 2022-09-12T05:59:32Z

Would be possible to for an adapter to set a default filter_dir function?

Yep definitely think this is needed 👍 Adapters can now define a filter_dir function with the same signature

I tried adding the filter_dir option, but it looks like it isn't being called.

Everything looks correct. Can you enable debug logging and submit a separate issue?

jfpedroza · 2022-09-13T04:14:34Z

No need, I have the setup_project call in .nvimrc and the Neotest config in after/plugin/neotest, so the configuration was being overwritten.

Works great, thank you.

rcarriga · 2022-09-28T08:33:12Z

I've not heard further reports on performance issues so I'm not sure how prevalent they are anymore but I wanted to have a little fun with using Neovim's RPC API. I've implemented the treesitter parsing to be run in a separate Neovim instance so that the only cost on the main editor instance is deserialising results. It's sitting in a PR right now #119, it'd be great to get some feedback on it.

It should never expose any errors directly to the user, instead logging errors and falling back to the current in-process parsing, so when testing please check the logs (I'd recommend the INFO level). The main issue I'm anticipating is lazy loading, I've added a call for require("neotest") to the subprocess so if you do lazy load, just add module = "neotest" to the config for packer (or equivalent for other managers).

Also some adapters (neotest-elixir, neotest-go, neotest-dotnet, neotest-rspec, neotest-phpunit) use callback functions for the parsing which can't be done over RPC. The solution here is to provide a string to evaluate to a function in the child process (e.g. require("my-adapter")._build_position so if adapter authors or people willing to make the change wish to test, it should be relatively simple.

If testing goes well, it might be possible to expose more helpers to run CPU intensive work in the subprocess

jfpedroza · 2022-10-03T02:04:13Z

I made the change, and it seems to be working well, although I didn't notice any change in the time it takes to discover.

What I did notice is that discovery is triggered twice.

When I open the summary, it outputs:

Searching <project folder> for test files
Discovering files with 20 workers

And then prints a bunch of Parsing <file> lines.

Then, I can see the Searching and Discovering lines again, and the Parsing line for every file.

Any file I searched in the log appeared twice, and I removed the log between runs.

Is there a reason for that?

rcarriga · 2022-10-03T08:27:45Z

Thanks for testing it out!

I don't think that'd be related to these changes as there's no change in the discovery logic, this is purely in the treesitter library functions. Are the second log lines from the parent or child instance (has CHILD in the line)? Also are you doing anything or just letting discovery run and then exiting?

Edit: Also to clarify, this won't speed up discovery (it's doing the same work), it just shouldn't ever block the editor because it's running in a separate neovim process

jfpedroza · 2022-10-05T06:24:45Z

Yeah, I didn't think it was related to these changes, but it is related to performance.

Are the second log lines from the parent or child instance (has CHILD in the line)?

They are all from the parent instance.

Also are you doing anything or just letting discovery run and then exiting?

I would either open the summary and check the logs, or run a test file and check the logs.

I forgot to mention that the lines Initialising client and Initialisation finished in X seconds would only print once, before the first batch of logs and after the second batch of logs, respectively.

rcarriga · 2022-10-05T08:36:07Z

Thanks for the info! Could you provide a clean log for some repo that I can access?

jfpedroza · 2022-10-06T05:54:50Z

Sure. One more thing I noticed is that the duplicate discovering only happens when I open the summary while having a test file loaded in a buffer. If I don't have a test file open, it only prints each file once.

Here is a repo I use for testing. It has very few test files, so the log is not long.

parquex_info.log
parquex_debug.log

In the log at INFO level it is easier to notice the duplicates, but the DEBUG one provides more info, so I included both.

Previously found adapters shouldn't have their discovery run twice on startup. See #13

rcarriga · 2022-10-06T07:48:05Z

Ah thanks!

duplicate discovering only happens when I open the summary while having a test file loaded in a buffer

This was the key 😅 It was an existing issue that has been fixed in master

jfpedroza · 2022-10-06T19:28:48Z

Great! The difference is very noticeable in my work repo.

rcarriga · 2022-10-08T14:33:08Z

Great to hear! I'm going to merge the PR so feel free to make the change to neotest-elixir, I'll get around to submitting PRs to other adapters soon as well.

Neotest core enabled treesitter parsing to be performed in a separate Neovim instance which means that we won't perform blocking CPU intensive work anymore. More details can be found here nvim-neotest/neotest#13 (comment). The issue is that the latest changes for building positions uses state by mutating the passed in tables to track parameterized tests, which prevents the subprocess parsing being used. Refactoring to use something stateless parsing as discussed here nvim-neotest/neotest#24 (reply in thread) allows us to use the subprocess

Enables subprocess parsing. Details here nvim-neotest/neotest#13 (comment)

Enables subprocess parsing. See nvim-neotest/neotest#13 (comment)

rcarriga · 2022-10-09T10:20:23Z

I'm going to close this issue as all of the originally requested features/changes have been made or have alternatives in place. If there are still performance issues, then we can address them specifically in separate issues.

olimorris · 2022-10-09T10:26:33Z

Thanks for the pull requests on the adapters. It definitely feels faster. ❤️

Enables subprocess parsing. See nvim-neotest/neotest#13 (comment)

rcarriga · 2022-10-09T10:35:18Z

Fantastic, love to get the feedback 😁

olimorris mentioned this issue Jun 13, 2022

Feature Request: RSpec Adapter #9

Closed

stevearc mentioned this issue Jun 15, 2022

Allow disabling directory crawling for tests #30

Merged

alfaix mentioned this issue Jul 7, 2022

Feature request: support for Google Test #64

Closed

stevearc mentioned this issue Jul 27, 2022

fzf floating window hangs/is slow to start after running Python tests #78

Closed

ajayvigneshk mentioned this issue Sep 7, 2022

make root directory more specific jfpedroza/neotest-elixir#7

Closed

rcarriga added a commit that referenced this issue Sep 11, 2022

feat(client): control concurrent discovery

65fc04d

See #13

rcarriga added a commit that referenced this issue Sep 11, 2022

feat: filter directories during discovery

d20507c

See #13

rcarriga added a commit that referenced this issue Sep 12, 2022

feat(adapters): provide filter_dir function

775e425

See #13

rcarriga added a commit that referenced this issue Oct 6, 2022

fix: avoid running discovery twice

e2d1378

Previously found adapters shouldn't have their discovery run twice on startup. See #13

rcarriga mentioned this issue Oct 9, 2022

feat: stateless parsing Issafalcon/neotest-dotnet#20

Merged

rcarriga added a commit to nvim-neotest/neotest-go that referenced this issue Oct 9, 2022

feat(treesitter): pass string arg for position_id

06993e9

Enables subprocess parsing. Details here nvim-neotest/neotest#13 (comment)

rcarriga mentioned this issue Oct 9, 2022

feat(treesitter): pass string arg for position_id nvim-neotest/neotest-go#20

Merged

rcarriga added a commit to rcarriga/neotest-rspec that referenced this issue Oct 9, 2022

feat(treesitter): pass position_id as string

3dac478

Enables subprocess parsing. See nvim-neotest/neotest#13 (comment)

rcarriga mentioned this issue Oct 9, 2022

feat(treesitter): pass position_id as string olimorris/neotest-rspec#29

Merged

rcarriga added a commit to rcarriga/neotest-phpunit that referenced this issue Oct 9, 2022

feat(treesitter): pass position_id as string

f320c0e

Enables subprocess parsing. See nvim-neotest/neotest#13 (comment)

rcarriga mentioned this issue Oct 9, 2022

feat(treesitter): pass position_id as string olimorris/neotest-phpunit#9

Merged

olimorris pushed a commit to olimorris/neotest-rspec that referenced this issue Oct 9, 2022

feat(treesitter): pass position_id as string (#29)

ff1c489

Enables subprocess parsing. See nvim-neotest/neotest#13 (comment)

rcarriga closed this as completed Oct 9, 2022

olimorris pushed a commit to olimorris/neotest-phpunit that referenced this issue Oct 9, 2022

feat(treesitter): pass position_id as string (#9)

0e5c306

Enables subprocess parsing. See nvim-neotest/neotest#13 (comment)

rcarriga mentioned this issue Nov 5, 2022

[BUG] Sometimes tests aren't detected #127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance problems on large repos #13

Performance problems on large repos #13

stevearc commented Jun 9, 2022

rcarriga commented Jun 9, 2022

stevearc commented Jun 10, 2022

rcarriga commented Jun 10, 2022

rcarriga commented Aug 28, 2022

stevearc commented Aug 28, 2022

stevanmilic commented Sep 5, 2022 •

edited

Loading

rcarriga commented Sep 7, 2022

rcarriga commented Sep 11, 2022

jfpedroza commented Sep 12, 2022

rcarriga commented Sep 12, 2022

jfpedroza commented Sep 13, 2022

rcarriga commented Sep 28, 2022 •

edited

Loading

jfpedroza commented Oct 3, 2022

rcarriga commented Oct 3, 2022 •

edited

Loading

jfpedroza commented Oct 5, 2022 •

edited

Loading

rcarriga commented Oct 5, 2022

jfpedroza commented Oct 6, 2022

rcarriga commented Oct 6, 2022

jfpedroza commented Oct 6, 2022

rcarriga commented Oct 8, 2022

rcarriga commented Oct 9, 2022

olimorris commented Oct 9, 2022

rcarriga commented Oct 9, 2022

Performance problems on large repos #13

Performance problems on large repos #13

Comments

stevearc commented Jun 9, 2022

rcarriga commented Jun 9, 2022

stevearc commented Jun 10, 2022

rcarriga commented Jun 10, 2022

rcarriga commented Aug 28, 2022

stevearc commented Aug 28, 2022

stevanmilic commented Sep 5, 2022 • edited Loading

rcarriga commented Sep 7, 2022

rcarriga commented Sep 11, 2022

jfpedroza commented Sep 12, 2022

rcarriga commented Sep 12, 2022

jfpedroza commented Sep 13, 2022

rcarriga commented Sep 28, 2022 • edited Loading

jfpedroza commented Oct 3, 2022

rcarriga commented Oct 3, 2022 • edited Loading

jfpedroza commented Oct 5, 2022 • edited Loading

rcarriga commented Oct 5, 2022

jfpedroza commented Oct 6, 2022

rcarriga commented Oct 6, 2022

jfpedroza commented Oct 6, 2022

rcarriga commented Oct 8, 2022

rcarriga commented Oct 9, 2022

olimorris commented Oct 9, 2022

rcarriga commented Oct 9, 2022

stevanmilic commented Sep 5, 2022 •

edited

Loading

rcarriga commented Sep 28, 2022 •

edited

Loading

rcarriga commented Oct 3, 2022 •

edited

Loading

jfpedroza commented Oct 5, 2022 •

edited

Loading