-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: SyntaxNode.ToFullStringWithNormalizedWhitespaces API #52914
Comments
I'd be interested to see if we could just fix that. Then everyone benefits. If not, then maybe specialized apis are needed. |
I see this has been added to the 17.1 milestone but the API still needs some work. If Roslyn proposals work the same as runtime ones I assume that means we should just discuss that here and settle on an API shape that can then go to LDM review? @CyrusNajmabadi is the blocker for you just the fact that you'd like to first investigate whether Just trying to get the conversation going here as I would love for this to make it into 17.1, is all 😄 |
Yes. Though I'm not volunteering for this :-) |
I haven't seen much evidence that this is true once you combine it with:
In other words, while NormalizeWhitespace has been measurably significant at times, I've found that a source generator deemed observably slow by users achieved that characteristic through other means. Even in the case of CsWin32 (the source generator with the most pronounced impact from NormalizeWhitespace), it would not have been fixed by this proposal and it has already mitigated the problem through other means. |
Closing this issue as we've seen no reply to the request for more information. If you are able to get the requested information, please add it to the issue and we will retriage it. |
Uh... This feels like it wasn't resolved yet, should we reopen it? 😅 |
@CyrusNajmabadi can you clarify what more info you were asking for? |
It is absolutely not in the table. C# is definitely whitespace sensitive, and doing this could def break code. We should not automatically do anything. Of you want normalized code, and that's ok for your domain, then just normalize it first. |
I'm referencing #49685 (comment) @AArnott Would you mind weighing in on this? 🙂 |
I have no problem with people making normalize whitespace fast. Seems like it people need better performance there then we should just get some prs out to improve it, versus creating whole new apis. As I've mentioned in the past, normalize was never something anyone spent time optimizing. So it's likely it can just be sped up if someone was interested in looking into that. |
I disagree with @sharwell that CsWin32's whitespace problem or perf problems are solved. We avoid NormalizeWhitespace completely now because it is unacceptably slow. Instead, we have our own The lengths we go to in order to have acceptable perf is far in excess of what I would consider reasonable, and the resulting generated text is sub-par at best. I don't care if it's a new API or fixing an old one, but we should not have to go to lengths to get good perf. To Sam's comment about incremental SGs, csWin32 doesn't yet use any incremental APIs. CsWin32 is fundamentally a non-incremental thing for the most part because one character typed in any code file could potentially change generated code. I expect we can find specific syntax sub-trees that can be cached and reused, but it will be very expensive and we haven't prioritized that yet, since our UX is ok for now. |
@AArnott i agree with thsi:
Currently, teh generation part of SG has zero funding. I've proposed options in the past about providing improvements and APIs here, but we haven't been able to fit them into the schedule at all because of all the other pressing matters. Generally speaking, i think this is because SGs are considered a highly advanced scenario, for niche areas. So the feeling is that the people using it are advanced enough to roll solutions, without us having to provide that ourselves. I'll push again for us to get some budget to improve things here. But i'm not certain anything will have changed in this area. |
Would love to see a solution to this. 40% of our build time is spent on calls to NormalizeWhitespace (full disclosure - we are generating thousands of classes at build time, using incremental source generators). Briefly playing with removing calls to NormalizeWhitespace and using trivia manually, then extrapolating the gains, we could see a reduction in build time from 3 minutes (on an MacBook M1 Max) to just over 1.5 minutes. Reduced memory usage was also observed (1GB vs 2GB). Unfortunately, manually generating trivia in EVERY syntax tree would be a huge pain and I'm not prepared to introduce all of that extra code to the generation process. |
@jeff-simeon can you jsut emit the strings directly? no point in SGs in using nodes, as we're going to have to just parse things anyways. |
I suppose we could - though it would make the code a lot harder to read, and it's relatively complex already. |
This was also briefly mentioned on Discord as well — in many scenarios, using syntax nodes makes the code for generators much easier to understand, follow and maintain, especially when the generators get more complicated. This applies to several generators in the .NET SDK as well (cc. @jkoritzinsky). Imho it'd be nice to have a more optimized version of this, on second thought a |
@Sergio0694 - I agree and this is exactly the scenario we have - which is probably why @AArnott wrote a FastSyntaxFactory which wraps a StringBuilder and avoids the nodes. |
That seems reasonable. Can you just use that factory instead? |
@CyrusNajmabadi - I'm not sure what you're talking about. I'm referring to the previous post by @AArnott, where he mentions something that his team built. I do not have access to this and searching for it on the internet yields no results. I'm merely speculating as to what he did to work around the problem. |
I don't think ease of readability is a core goal of SGs. Basically, every time this has come up on the team (and i've brought it up a few times), the thinking is that this is all for computers to do, and writing fast string-writing code is acceptable. If someone wants to make their own api to help here, they can, but it's not an investment area. I proposed options in the past to make things better, but it's all be rejected as there is no capacity on the team to take on new APIs to help out effectively expert scenarios. |
It's on github: You can fork/copy this. Note: this doesn't write to strings, it just avoids elastic trivia, which can be expensive if not needed. |
@Sergio0694 Nothing is stopping teams (like .net) that really want this from writing such a thing. Trees are just a simple data structure. And if it's hyper critical that trees be the form you use and you don't want to use trivia and you want to write it out to a string quickly, then writing your own abstraction to do that seems totally viable. The cost assessment in the past was that this wasn't an area we were willing to invest in. It's hyperspecialized just to this narrow domain, and writing out strings is just something individual libraries can do themselves, or utilize a library that someone wants to invest in. |
My FastSyntaxFactory doesn't do that at all. It merely avoids elastic trivia (IIRC) by explicitly specifying all tokens when forwarding all calls to the regular SyntaxFactory. |
Ok, thanks for the reply, but it's a bummer :( We will have to build our own syntax node model and wrap a StringBuilder at some point. |
I would avoid a syntax node model altogether. It's an unnecessary intermediary abstraction (and often expensive in memory to boot). |
Using a StringBuilder directly would be a maintainability problem. We'll have to build our own model that wraps a StringBuilder....I suppose it will be just like a SyntaxNode model.......but have a fast ToString method on it. At the end of the day, all abstractions are unnecessary and come at some performance cost. The complexity of what we're doing requires some abstraction from direct string manipulation. That will come at some performance cost. But preferably, not a cost that occupies 40% of our build time. |
I would be very surprised by that. Writing out code is usually very simple. Here's an example of a helper type i whipped up (similar to lots of stuff we've done before) to make it super easy: https://gist.github.com/CyrusNajmabadi/edf9f6059d8fd2ced7a2086a8835be1d Just write to this, and tell it when you're increasing or decreasing indentation (and def write helpers/extensions to do it for common cases, like blocks). And it takes care of the rest.
Sure. But it basically does nothing but pass through the data, and just handle indentation so that everything is not left-flushed. Almost no work/overhead necessary at runtime, and very simple to use on the consumer side.
Sure. But syntax-nodes are a huge abstraction cost. Like huge. IIRC, they're around 5 times the size of the strings they represent. And they require substantial manipulation to get them formatted. They're really good for dealing with user code that needs to be understood/manipulated. They're really not necessary for a space where the end goal is generating code. |
Changed the benchmark in this issue to use a custom indented string writer, here's the benchmark:
I think @CyrusNajmabadi has successfully converted me over to the "just write to an indented writer" gang 😄 Benchmark code (click to expand):using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using ComputeSharp.SourceGeneration.Helpers;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory;
BenchmarkRunner.Run<NormalizeWhitespaceBenchmark>();
[MemoryDiagnoser]
public class NormalizeWhitespaceBenchmark
{
private readonly string typeName = typeof(NormalizeWhitespaceBenchmark).FullName!;
private readonly string typeVersion = typeof(NormalizeWhitespaceBenchmark).Assembly.GetName().Version!.ToString();
private readonly string recipientType = "global::MyViewModel";
private readonly string messageTypeA = "global::MessageA";
private readonly string messageTypeB = "global::MessageB";
private readonly string messageTypeC = "global::MessageC";
[Benchmark]
public string IndentedWriter()
{
using IndentedTextWriter writer = IndentedTextWriter.Rent();
writer.WriteLine("// Licensed to the .NET Foundation under one or more agreements.");
writer.WriteLine("// The .NET Foundation licenses this file to you under the MIT license.");
writer.WriteLine("// See the LICENSE file in the project root for more information.");
writer.WriteLine("#pragma warning disable");
writer.WriteLine("namespace Microsoft.Toolkit.Mvvm.Messaging.__Internals");
using (writer.WriteBlock())
{
writer.WriteLine($"""[global::System.CodeDom.Compiler.GeneratedCode("{this.typeName}", "{this.typeVersion}")]""");
writer.WriteLine($"""[global::System.Diagnostics.DebuggerNonUserCode]""");
writer.WriteLine($"""[global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage]""");
writer.WriteLine($"""[global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]""");
writer.WriteLine($"""[global::System.Obsolete("This type is not intended to be used directly by user code")]""");
writer.WriteLine($"""internal static partial class __IMessengerExtensions""");
using (writer.WriteBlock())
{
writer.WriteLine($"""[global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]""");
writer.WriteLine($"""[global::System.Obsolete("This type is not intended to be used directly by user code")]""");
writer.WriteLine($"""public static global::System.Action<IMessenger, object> CreateAllMessagesRegistrator({this.recipientType} _)""");
using (writer.WriteBlock())
{
writer.WriteLine($"""static void RegisterAll(IMessenger messenger, object obj)""");
using (writer.WriteBlock())
{
writer.WriteLine($"""var recipient = ({this.recipientType})obj;""");
writer.WriteLine($"""messenger.Register<{this.messageTypeA}>(recipient);""");
writer.WriteLine($"""messenger.Register<{this.messageTypeB}>(recipient);""");
writer.WriteLine($"""messenger.Register<{this.messageTypeC}>(recipient);""");
}
writer.WriteLine();
writer.WriteLine("return RegisterAll;");
}
writer.WriteLine();
writer.WriteLine($"""[global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]""");
writer.WriteLine($"""[global::System.Obsolete("This type is not intended to be used directly by user code")]""");
writer.WriteLine($"""public static global::System.Action<IMessenger, object, TToken> CreateAllMessagesRegistratorWithToken<TToken>({this.recipientType} _)""");
writer.WriteLine($""" where TToken : global::System.IEquatable<TToken>""");
using (writer.WriteBlock())
{
writer.WriteLine("static void RegisterAll(IMessenger messenger, object obj, TToken token)");
using (writer.WriteBlock())
{
writer.WriteLine($"""var recipient = ({this.recipientType})obj;""");
writer.WriteLine($"""messenger.Register<{this.messageTypeA}, TToken>(recipient, token);""");
writer.WriteLine($"""messenger.Register<{this.messageTypeB}, TToken>(recipient, token);""");
writer.WriteLine($"""messenger.Register<{this.messageTypeC}, TToken>(recipient, token);""");
}
writer.WriteLine();
writer.WriteLine("return RegisterAll;");
}
}
}
return writer.ToString();
}
[Benchmark(Baseline = true)]
public string SyntaxTrees()
{
return GetDummyTree().NormalizeWhitespace(eol: "\n").ToFullString();
}
private CompilationUnitSyntax GetDummyTree()
{
// // Licensed to the .NET Foundation under one or more agreements.
// // The .NET Foundation licenses this file to you under the MIT license.
// // See the LICENSE file in the project root for more information.
//
// #pragma warning disable
//
// namespace Microsoft.Toolkit.Mvvm.Messaging.__Internals
// {
// [global::System.CodeDom.Compiler.GeneratedCode("...", "...")]
// [global::System.Diagnostics.DebuggerNonUserCode]
// [global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage]
// [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
// [global::System.Obsolete("This type is not intended to be used directly by user code")]
// internal static partial class __IMessengerExtensions
// {
// [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
// [global::System.Obsolete("This method is not intended to be called directly by user code")]
// public static global::System.Action<IMessenger, object> CreateAllMessagesRegistrator(<RECIPIENT_TYPE> _)
// {
// static void RegisterAll(IMessenger messenger, object obj)
// {
// var recipient = (<RECIPIENT_TYPE>)obj;
// messenger.Register<<MESSAGE_TYPE>>(recipient);
// messenger.Register<<MESSAGE_TYPE>>(recipient);
// messenger.Register<<MESSAGE_TYPE>>(recipient);
// }
//
// return RegisterAll;
// }
//
// [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
// [global::System.Obsolete("This method is not intended to be called directly by user code")]
// public static global::System.Action<IMessenger, object, TToken> CreateAllMessagesRegistratorWithToken<TToken>(<RECIPIENT_TYPE> _)
// where TToken : global::System.IEquatable<TToken>
// {
// static void RegisterAll(IMessenger messenger, object obj, TToken token)
// {
// var recipient = (<RECIPIENT_TYPE>)obj;
// messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
// messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
// messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
// }
//
// return RegisterAll;
// }
// }
// }
return
CompilationUnit().AddMembers(
NamespaceDeclaration(IdentifierName("Microsoft.Toolkit.Mvvm.Messaging.__Internals")).WithLeadingTrivia(TriviaList(
Comment("// Licensed to the .NET Foundation under one or more agreements."),
Comment("// The .NET Foundation licenses this file to you under the MIT license."),
Comment("// See the LICENSE file in the project root for more information."),
Trivia(PragmaWarningDirectiveTrivia(Token(SyntaxKind.DisableKeyword), true)))).AddMembers(
ClassDeclaration("__IMessengerExtensions").AddModifiers(
Token(SyntaxKind.InternalKeyword),
Token(SyntaxKind.StaticKeyword),
Token(SyntaxKind.PartialKeyword)).AddAttributeLists(
AttributeList(SingletonSeparatedList(
Attribute(IdentifierName($"global::System.CodeDom.Compiler.GeneratedCode"))
.AddArgumentListArguments(
AttributeArgument(LiteralExpression(SyntaxKind.StringLiteralExpression, Literal(this.typeName))),
AttributeArgument(LiteralExpression(SyntaxKind.StringLiteralExpression, Literal(this.typeVersion)))))),
AttributeList(SingletonSeparatedList(Attribute(IdentifierName("global::System.Diagnostics.DebuggerNonUserCode")))),
AttributeList(SingletonSeparatedList(Attribute(IdentifierName("global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage")))),
AttributeList(SingletonSeparatedList(
Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
AttributeList(SingletonSeparatedList(
Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
AttributeArgument(LiteralExpression(
SyntaxKind.StringLiteralExpression,
Literal("This type is not intended to be used directly by user code"))))))).AddMembers(
MethodDeclaration(
GenericName("global::System.Action").AddTypeArgumentListArguments(
IdentifierName("IMessenger"),
PredefinedType(Token(SyntaxKind.ObjectKeyword))),
Identifier("CreateAllMessagesRegistrator")).AddAttributeLists(
AttributeList(SingletonSeparatedList(
Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
AttributeList(SingletonSeparatedList(
Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
AttributeArgument(LiteralExpression(
SyntaxKind.StringLiteralExpression,
Literal("This method is not intended to be called directly by user code"))))))).AddModifiers(
Token(SyntaxKind.PublicKeyword),
Token(SyntaxKind.StaticKeyword)).AddParameterListParameters(
Parameter(Identifier("_")).WithType(IdentifierName(this.recipientType)))
.WithBody(Block(
LocalFunctionStatement(
PredefinedType(Token(SyntaxKind.VoidKeyword)),
Identifier("RegisterAll"))
.AddModifiers(Token(SyntaxKind.StaticKeyword))
.AddParameterListParameters(
Parameter(Identifier("messenger")).WithType(IdentifierName("IMessenger")),
Parameter(Identifier("obj")).WithType(PredefinedType(Token(SyntaxKind.ObjectKeyword))))
.WithBody(Block(
LocalDeclarationStatement(
VariableDeclaration(IdentifierName("var"))
.AddVariables(
VariableDeclarator(Identifier("recipient"))
.WithInitializer(EqualsValueClause(
CastExpression(
IdentifierName(this.recipientType),
IdentifierName("obj")))))))
.AddStatements(
ExpressionStatement(
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("messenger"),
GenericName(Identifier("Register")).AddTypeArgumentListArguments(
IdentifierName(this.messageTypeA))))
.AddArgumentListArguments(Argument(IdentifierName("recipient")))),
ExpressionStatement(
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("messenger"),
GenericName(Identifier("Register")).AddTypeArgumentListArguments(
IdentifierName(this.messageTypeB))))
.AddArgumentListArguments(Argument(IdentifierName("recipient")))),
ExpressionStatement(
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("messenger"),
GenericName(Identifier("Register")).AddTypeArgumentListArguments(
IdentifierName(this.messageTypeC))))
.AddArgumentListArguments(Argument(IdentifierName("recipient")))))),
ReturnStatement(IdentifierName("RegisterAll")))),
MethodDeclaration(
GenericName("global::System.Action").AddTypeArgumentListArguments(
IdentifierName("IMessenger"),
PredefinedType(Token(SyntaxKind.ObjectKeyword)),
IdentifierName("TToken")),
Identifier("CreateAllMessagesRegistratorWithToken")).AddAttributeLists(
AttributeList(SingletonSeparatedList(
Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
AttributeList(SingletonSeparatedList(
Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
AttributeArgument(LiteralExpression(
SyntaxKind.StringLiteralExpression,
Literal("This method is not intended to be called directly by user code"))))))).AddModifiers(
Token(SyntaxKind.PublicKeyword),
Token(SyntaxKind.StaticKeyword)).AddParameterListParameters(
Parameter(Identifier("_")).WithType(IdentifierName(this.recipientType)))
.AddTypeParameterListParameters(TypeParameter("TToken"))
.AddConstraintClauses(
TypeParameterConstraintClause("TToken")
.AddConstraints(TypeConstraint(GenericName("global::System.IEquatable").AddTypeArgumentListArguments(IdentifierName("TToken")))))
.WithBody(Block(
LocalFunctionStatement(
PredefinedType(Token(SyntaxKind.VoidKeyword)),
Identifier("RegisterAll"))
.AddModifiers(Token(SyntaxKind.StaticKeyword))
.AddParameterListParameters(
Parameter(Identifier("messenger")).WithType(IdentifierName("IMessenger")),
Parameter(Identifier("obj")).WithType(PredefinedType(Token(SyntaxKind.ObjectKeyword))),
Parameter(Identifier("token")).WithType(IdentifierName("TToken")))
.WithBody(Block(
LocalDeclarationStatement(
VariableDeclaration(IdentifierName("var"))
.AddVariables(
VariableDeclarator(Identifier("recipient"))
.WithInitializer(EqualsValueClause(
CastExpression(
IdentifierName(this.recipientType),
IdentifierName("obj")))))))
.AddStatements(
ExpressionStatement(
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("messenger"),
GenericName(Identifier("Register")).AddTypeArgumentListArguments(
IdentifierName(this.messageTypeA),
IdentifierName("TToken"))))
.AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))),
ExpressionStatement(
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("messenger"),
GenericName(Identifier("Register")).AddTypeArgumentListArguments(
IdentifierName(this.messageTypeB),
IdentifierName("TToken"))))
.AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))),
ExpressionStatement(
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("messenger"),
GenericName(Identifier("Register")).AddTypeArgumentListArguments(
IdentifierName(this.messageTypeC),
IdentifierName("TToken"))))
.AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))))),
ReturnStatement(IdentifierName("RegisterAll")))))));
}
} |
@Sergio0694 that matches my expectations. Using syntax trees at all is just hugely unnecessary overhead. They're gargantuan in memory, and involve an enormous amount of pointer chasing. They're also just not a great intermediary construct for the domain of just wanting to write out a linearized string. They are great if you want to actually operate on them in a tree-structured fashion. But I don't think I've ever seen a generator that actually does that. Rather, all generators just write the content start to finish. Or they make subsections, but then concatenate then linearly. All these cases are much better served with just writing a string. Personally, I think writing the string is much easier and clearer too, as long as you have the right writer-abstraction (like the one I linked to). |
That is indeed compelling data. FWIW, CsWin32 does emit node trees, and we modify them along the way. It would be a total rewrite (of a huge codebase) to switch to StringBuilder both because all our SyntaxFactory callers would have to change and because we'd essentially have to decide up front everything we need to do before starting to emit code because we wouldn't be able to emit nodes and modify them later as we do now. |
You can def do this. Note: as i've mentioned in discord to several people, a very reasonable thing to do is create the indented string chucks for those other nodes up front (and then cache them). Later, when building the full text, it is trivial to add an existing indented node to the final indented writer you are using. -- I cannot emphasize enough how much of a better approach to SG writing this is. Syntax and trees are a bad abstraction here. They are enormously expensive, unweildy, and designed to be good for very different scenarios. Personally, i've never once found an SG case that benefits from them. Contract that with string writing. It's blazing fast. Extremely simple. Very easy to compose. And trivial to get the formatting you want within a line, and the indentation we expect for normal C# or VB. As we can see from the numbers, you should expect roughly two orders of magnitude improvement for CPU and memory, and all using an absolutely trivial type that is super easy to use. SGs should do this. |
Overview
A common pattern in source generators (from what I can tell) for people using Roslyn APIs to generate trees is as follows:
NormalizeWhitespace()
ToFullString()
context.AddSource("Foo.cs", SourceText.From(source, Encoding.UTF8));
with the resultingstring
This is extremely inefficient, specifically at points (2) and (3). The issue is that
NormalizeWhitespace
is quite expensive, and in this case it's also completely unnecessary: we don't want a new tree per se, we just need the finalstring
to have the code from the initial tree with normalized whitespaces. The problem is that thatNormalizeWhitespace
ends up taking the majority of the CPU time in this whole workflow, and also allocating unnecessary memory. The allocations aren't necessarily a problem, but the increased CPU time very much is, considering the performance critical nature of source generators to keep a smooth IDE experience.I've put together a benchmark to show how much
NormalizeWhitespace
can impact performance:Full benchmark source code (click to expand):
You can see how calling
NormalizeWhitespace
alone increases the total CPU time by 5x in this sample benchmark.Proposal
The simplest API proposal to address this issue would be to add a
ToFullStringWithNormalizedWhitespaces
API:namespace Microsoft.CodeAnalysis { public abstract class SyntaxNode { + public virtual string ToFullStringWithNormalizedWhitespaces(); } }
This would produce a
string
equivalent to callingNormalizeWhitespaces().ToFullString()
, but in an optimized manner, ie. without actually duplicating the whole tree and creating new nodes, but just adding the right whitespaces to the output as needed while traversing the tree. This would require very little code changes by consumers currently doingNormalizeWhitespaces().ToFullString()
, but would greatly improve the performance of their source generators. I could also see some specific APIs to directly create aSourceText
instance from a tree, with normalized whitespaces, which could then also skip creating thestring
entirely, but considering that CPU time is the biggest problem here and not the actual memory allocations, that seems to be less of a priority from what I can see. Ideally, this newToFullStringWithNormalizedWhitespaces()
API would have a performance as close to justToFullString()
as possible.The text was updated successfully, but these errors were encountered: