-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
onepagers for MSBuildServer and RAR caching (#11005)
* onepagers for MSBuildServer and RAR caching * onepagers update based on reviews. * Update RAR_caching_onepager.md * Update MSBuild_Server_onepager.md * Update RAR_caching_onepager.md
- Loading branch information
Showing
2 changed files
with
139 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
## MSBuild Server | ||
|
||
MSBuild server aims to create a persistent entry node for the MSBuild process | ||
that we would communicate with via a thin client. We want to get from | ||
the current state of “spawn a complete process for every CLI invocation” | ||
to “we have a server process in the background and we only spawn a small | ||
CLI handler that will tell the server what to build”. | ||
This project is based on an already existing project: [MSBuild Server](https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md). | ||
We need to re-enable it and figure out the way forward. | ||
|
||
### Goals and Motivation | ||
|
||
Currently all the MSBuild processes are persistent, except for the entry | ||
point process which lives only for the duration of the build. Restarting | ||
this process with each and every build leads to some overhead due to | ||
startup costs like jitting. It also leads to a loss of continuity mainly | ||
due to the absence of caching. | ||
|
||
The primary aim of the MSBuild server is to reduce this startup | ||
overhead. | ||
|
||
The secondary aim of this project is to enable us to introduce more | ||
advanced caching and potentially some other performance optimizations | ||
further down the line. However these aren’t in the current scope. | ||
|
||
### Impact | ||
|
||
Small performance improvement in the short term. Enabling further | ||
optimizations in the long term. (these improvements are for the Dev Kit | ||
and inner loop CLI scenarios) | ||
|
||
Getting closer to the possibility of decoupling from Visual Studio. VS is currently | ||
acting as a MSBuild server in some ways - they are a persistent process that invokes | ||
portions of the MSBuild. We ideally want to transition into "VS calls our server | ||
instead" relation, to make our behavior consistent for both VS and CLI based | ||
builds. | ||
|
||
### Stakeholders | ||
|
||
MSBuild Team. Successful handover means turning on the | ||
feature, dogfooding it for long enough to ensure we have reasonable | ||
certainty that nothing breaks and then rolling it out. | ||
We should cooperate with our close repositories like SDK and roslyn | ||
to get them to opt in before we roll the feature out. | ||
|
||
### Risks | ||
|
||
The project was already attempted once, however it was postponed because | ||
it surfaced a group of bugs that weren’t previously visible due to the | ||
processes not being persistent. One such example is Nuget authentication caching, | ||
which was a non-isue for a non persistent process but became a blocker | ||
due to the cache not being refreshable in-process. | ||
Most of those bugs should be solved by now, however we can run into some | ||
new ones. Unfortunately the nature of the bugs means that these won't become | ||
apparent until we start with the dogfooding. | ||
|
||
### Cost | ||
Note that these are mostly my rough guess based on my limited knowledge. | ||
|
||
A week to figure out how to turn on the MSBuild Server in a way that | ||
will enable us to dogfood it properly **plus** some overhead for the | ||
review loop. | ||
|
||
A month of developer time for bugfixes assuming that nothing goes | ||
terribly wrong. | ||
|
||
Some PM time to communicate with appropriate teams to ask them for help | ||
with dogfooding. | ||
|
||
### Plan | ||
|
||
- In a first month we should aim to get the MSBuild server dogfooded for | ||
our MSBuild repository inner development loop. (Coding + review + setting up) | ||
|
||
- Second month we will monitor it and fix anything that crops up. | ||
|
||
- After that we start dogfooding internally in our neighbor repositories(SDK, roslyn) | ||
for as long as we feel necessary to ensure everything works as intended. I would | ||
give this period one to three months of monitoring + bugfixing when necessary. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
## RAR caching | ||
RAR(Resolving of Assembly references) is an optimization for the step in | ||
every build where we need to gather the graph of assembly references and pass | ||
them to the compiler. This process is highly cacheable as the references | ||
don’t change all that often. Currently we have some limited caching in | ||
place however the way nodes are assigned work results in frequent cache | ||
misses. | ||
|
||
### Goals and motivations | ||
|
||
1ES team wants to isolate their File I/O related to the RAR caching which is causing | ||
issues to their debugging efforts. This is mostly due to the fact that MSBuild is pulling | ||
files from all nodes at once which results in a tangled mess of IO that is hard to debug. | ||
|
||
Our motivation is a possible performance gain however we’re fine with | ||
the change as long as the impact is not negative. | ||
|
||
### Impact | ||
|
||
The only impact we’re concerned about is the performance. There will be | ||
a tension between the gains from caching and costs due to the IPC from | ||
the process that will act as the cache repository. We need to ensure | ||
that this balance will be a net positive performance wise. | ||
|
||
### Stakeholders | ||
|
||
1ES team, Tomas Bartonek, Rainer Sigwald | ||
|
||
1ES team will provide the initial cache implementation. We will review | ||
their PRs and do the performance evaluations. Handover will be | ||
successful if nothing breaks and we meet our performance requirements | ||
(no regression or better still an improvement). | ||
|
||
### Risks | ||
|
||
Some time ago Roman Konecny estimated the RAR caching to not be worth it | ||
performance wise. 1ES team claims to have created an implementation that | ||
will either improve or not change the performance. We need to validate | ||
this claim and push back in case we find performance regression. | ||
Thorough testing will be needed especially to ensure the performance | ||
is not impacted. | ||
|
||
The risk is having to figure out a different way to help 1ES team to | ||
isolate their File I/Os if the caching hurts the performance. This could | ||
result in a larger project requiring more involvement on our side. | ||
|
||
### Cost | ||
|
||
Week for reviewing the provided PR. Additional two weeks for performance | ||
testing conditional on the Perfstar infrastructure being functional. | ||
Some communication overhead | ||
|
||
## Plan | ||
|
||
1ES team creates the PR wih the RAR cache implementation. | ||
|
||
We review the PR with a special emphasis on the performance side of | ||
things. | ||
Then we merge the changes. There is no expected follow up beyond the | ||
usual maintenance for our codebase. |