-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow plugins to efficiently reuse Gazelle's filesystem walk #1737
Conversation
This is pretty lightweight and generic, which I like. @linzhp @tyler-french What are your thoughts? |
I don't think |
The problem with |
|
@pyrocat101 can speak to whether that would be correct, but I want to point out that such an approach would eat into some (most?) of the performance gains. Most directories don't have a If your concern is the semantic impurity, another option would be to add an extra (optional) pass between Configure and Resolve to feed the file list to the plugin, though that would be a bit more invasive change. |
But reading To elaborate on the usage, the trie tree is used not only in |
This might be a good time for us to diverge from the classic plugin interface. There are a couple of tricky to deal with issues with the current interface. Most notably, no functions take a context, and arguments can not be added in a backwards compatible way in case of a deviation. @linzhp and I have chatted about this in the past and I think we're reasonably aligned on it but I wanna jot down a couple of thoughts I've had. As high level design objectives, we want:
Given that gRPC seems to be the bees knees these days and it meets the design criteria. Obviously there is no need to actually have it RPC for existing plugins, and a pretty straightforward plugin to write would be the one that conforms to the V2 interface, but it translates the calls into the old interface. That should mean no pain for any preexisting plugins. You also get the ability to generate a gRPC client and use something like go-plugin to allow binaries that are external to your own to join in on the "fun". It's pretty straightforward to do in any language. Here is an example in python of a plugin that is launched by the wrapper process and gRPC communicated. |
I think you can get most of what you want by just making the plugin functions take Writing plugins in other language is nice but it's not a serious problem, there are plenty of examples (both in open-source and internal to serious bazel users) of the Go plugin spawning a server written in another language if that makes the dependency analysis easier. Protobuf over stdio or UDS works pretty well. gRPC has some gotchas in the "less supported" languages (including Python. When we tried to use it for anything serious we hit so many issues that we ended up writing our own bindings to the C core) Anyway, we ended up forking Gazelle to do what we describe above, it's a pretty self-contained solution so we're not too worried about maintaining it. If you're not interested in pursuing a more optimal solution, feel free to close out this issue. |
Having a |
We should not be using `os.Stat` to check if files exist. It is very expensive and gazelle has already read the full list of entries for the directory. The gazelle patch exposes that list. This is similar to bazel-contrib/bazel-gazelle#1737 which was done at airtable for similar reasons. On one major repo this reduces the time by almost 2s which is ~15% and well worth the patch imo. --- ### Changes are visible to end-users: no - Searched for relevant documentation and updated as needed: yes - Breaking change (forces users to change their own code or config): no - Suggested release notes appear below: yes Reduce fs io of `aspect configure` during fs traversal. ### Test plan - Covered by existing test cases GitOrigin-RevId: b17d52b2723d3b7e4d5ee33fc294f491062ded5a
We should not be using `os.Stat` to check if files exist. It is very expensive and gazelle has already read the full list of entries for the directory. The gazelle patch exposes that list. This is similar to bazel-contrib/bazel-gazelle#1737 which was done at airtable for similar reasons. On one major repo this reduces the time by almost 2s which is ~15% and well worth the patch imo. --- ### Changes are visible to end-users: no - Searched for relevant documentation and updated as needed: yes - Breaking change (forces users to change their own code or config): no - Suggested release notes appear below: yes Reduce fs io of `aspect configure` during fs traversal. ### Test plan - Covered by existing test cases GitOrigin-RevId: b17d52b2723d3b7e4d5ee33fc294f491062ded5a
We should not be using `os.Stat` to check if files exist. It is very expensive and gazelle has already read the full list of entries for the directory. The gazelle patch exposes that list. This is similar to bazel-contrib/bazel-gazelle#1737 which was done at airtable for similar reasons. On one major repo this reduces the time by almost 2s which is ~15% and well worth the patch imo. --- ### Changes are visible to end-users: no - Searched for relevant documentation and updated as needed: yes - Breaking change (forces users to change their own code or config): no - Suggested release notes appear below: yes Reduce fs io of `aspect configure` during fs traversal. ### Test plan - Covered by existing test cases GitOrigin-RevId: b17d52b2723d3b7e4d5ee33fc294f491062ded5a
What type of PR is this?
Performance improvement
What package or component does this PR mostly affect?
Plugins
What does this PR do? Why is it needed?
Our gazelle plugin (for typescript) builds an in-memory trie of the filesystem. This is needed because a given import string can resolve to multiply physical files, depending what is on-disk. (For example
foo/bar
might resolve tofoo/bar.ts
,foo/bar/index.js
, 'foo/bar.js, 'node_modules/foo/bar.js
, etc.)We build this trie during the
Configure
phase so that we are guaranteed it's complete beforeResolve
. However, it's wasteful to issueos.Readdir
in our plugin when Gazelle has just done the same work. This patch is around 15% improvement in total Gazelle runtime for us. (4.2s -> 3.5s)Other notes for review
Adding an optional interface like this was the most backwards-compatible way to make the change. Another option would be to add the directory contents to
config.Config
. That would keep the API surface area cleaner and mean plugins don't need to invoke a new API to benefit. However, that would require deciding what to do aboutConfig.Clone()
, and also might have some tricky interactions with potential future efforts to parallelizewalk.Walk
. But I'm happy to rework the PR to do that if yall prefer!