Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onepagers for MSBuildServer and RAR caching #11005

Merged
merged 6 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions documentation/specs/proposed/MSBuild_Server_onepager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
## MSBuild Server

MSBuild server aims to create a persistent entry node for the MSBuild process
that we would communicate with via a thin client. We want to get from
the current state of “spawn a complete process for every CLI invocation”
to “we have a server process in the background and we only spawn a small
CLI handler that will tell the server what to build”.
This project is based on an already existing project: [MSBuild Server](https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md).
We need to re-enable it and figure out the way forward.

### Goals and Motivation

Currently all the MSBuild processes are persistent, except for the entry
point process which lives only for the duration of the build. Restarting
this process with each and every build leads to some overhead due to
startup costs like jitting. It also leads to a loss of continuity mainly
due to the absence of caching.

The primary aim of the MSBuild server is to reduce this startup
overhead.

The secondary aim of this project is to enable us to introduce more
advanced caching and potentially some other performance optimizations
further down the line. However these aren’t in the current scope.

### Impact

Small performance improvement in the short term. Enabling further
optimizations in the long term. (these improvements are for the Dev Kit
and inner loop CLI scenarios)

Getting closer to the possibility of decoupling from Visual Studio. VS is currently
acting as a MSBuild server in some ways - they are a persistent process that invokes
portions of the MSBuild. We ideally want to transition into "VS calls our server
instead" relation, to make our behavior consistent for both VS and CLI based
builds.

### Stakeholders

MSBuild Team. Successful handover means turning on the
feature, dogfooding it for long enough to ensure we have reasonable
certainty that nothing breaks and then rolling it out.
We should cooperate with our close repositories like SDK and roslyn
to get them to opt in before we roll the feature out.

### Risks

The project was already attempted once, however it was postponed because
it surfaced a group of bugs that weren’t previously visible due to the
processes not being persistent. One such example is Nuget authentication caching,
which was a non-isue for a non persistent process but became a blocker
due to the cache not being refreshable in-process.
Most of those bugs should be solved by now, however we can run into some
new ones. Unfortunately the nature of the bugs means that these won't become
apparent until we start with the dogfooding.

### Cost
Note that these are mostly my rough guess based on my limited knowledge.

A week to figure out how to turn on the MSBuild Server in a way that
will enable us to dogfood it properly **plus** some overhead for the
review loop.

A month of developer time for bugfixes assuming that nothing goes
terribly wrong.

Some PM time to communicate with appropriate teams to ask them for help
with dogfooding.

### Plan

- In a first month we should aim to get the MSBuild server dogfooded for
our MSBuild repository inner development loop. (Coding + review + setting up)

- Second month we will monitor it and fix anything that crops up.

- After that we start dogfooding internally in our neighbor repositories(SDK, roslyn)
for as long as we feel necessary to ensure everything works as intended. I would
give this period one to three months of monitoring + bugfixing when necessary.
60 changes: 60 additions & 0 deletions documentation/specs/proposed/RAR_caching_onepager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
## RAR caching
RAR(Resolving of Assembly references) is an optimization for the step in
every build where we need to gather the graph of assembly references and pass
them to the compiler. This process is highly cacheable as the references
don’t change all that often. Currently we have some limited caching in
place however the way nodes are assigned work results in frequent cache
misses.

### Goals and motivations

1ES team wants to isolate their File I/O related to the RAR caching which is causing
issues to their debugging efforts. This is mostly due to the fact that MSBuild is pulling
files from all nodes at once which results in a tangled mess of IO that is hard to debug.

Our motivation is a possible performance gain however we’re fine with
the change as long as the impact is not negative.

### Impact

The only impact we’re concerned about is the performance. There will be
a tension between the gains from caching and costs due to the IPC from
the process that will act as the cache repository. We need to ensure
that this balance will be a net positive performance wise.

### Stakeholders

1ES team, Tomas Bartonek, Rainer Sigwald

1ES team will provide the initial cache implementation. We will review
their PRs and do the performance evaluations. Handover will be
successful if nothing breaks and we meet our performance requirements
(no regression or better still an improvement).

### Risks

Some time ago Roman Konecny estimated the RAR caching to not be worth it
performance wise. 1ES team claims to have created an implementation that
will either improve or not change the performance. We need to validate
this claim and push back in case we find performance regression.
Thorough testing will be needed especially to ensure the performance
is not impacted.

The risk is having to figure out a different way to help 1ES team to
isolate their File I/Os if the caching hurts the performance. This could
result in a larger project requiring more involvement on our side.

### Cost

Week for reviewing the provided PR. Additional two weeks for performance
testing conditional on the Perfstar infrastructure being functional.
Some communication overhead

## Plan

1ES team creates the PR wih the RAR cache implementation.

We review the PR with a special emphasis on the performance side of
things.
Then we merge the changes. There is no expected follow up beyond the
usual maintenance for our codebase.