Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: SyntaxNode.ToFullStringWithNormalizedWhitespaces API #52914

Open
Sergio0694 opened this issue Apr 26, 2021 · 33 comments
Open

Proposal: SyntaxNode.ToFullStringWithNormalizedWhitespaces API #52914

Sergio0694 opened this issue Apr 26, 2021 · 33 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation Area-Compilers Bug Concept-API This issue involves adding, removing, clarification, or modification of an API.
Milestone

Comments

@Sergio0694
Copy link
Contributor

Overview

A common pattern in source generators (from what I can tell) for people using Roslyn APIs to generate trees is as follows:

  1. Build a compilation unit with some members (or, some syntax tree in general)
  2. Call NormalizeWhitespace()
  3. Call ToFullString()
  4. Call context.AddSource("Foo.cs", SourceText.From(source, Encoding.UTF8)); with the resulting string

This is extremely inefficient, specifically at points (2) and (3). The issue is that NormalizeWhitespace is quite expensive, and in this case it's also completely unnecessary: we don't want a new tree per se, we just need the final string to have the code from the initial tree with normalized whitespaces. The problem is that that NormalizeWhitespace ends up taking the majority of the CPU time in this whole workflow, and also allocating unnecessary memory. The allocations aren't necessarily a problem, but the increased CPU time very much is, considering the performance critical nature of source generators to keep a smooth IDE experience.

I've put together a benchmark to show how much NormalizeWhitespace can impact performance:

Method Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
CreateNormalizeToString 916.2 us 9.12 us 8.53 us 1.00 32.2266 8.7891 1.9531 184.57 KB
CreateToString 178.4 us 0.41 us 0.34 us 0.19 19.2871 4.6387 - 79.21 KB
Full benchmark source code (click to expand):
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory;

BenchmarkRunner.Run<NormalizeWhitespaceBenchmark>();

[MemoryDiagnoser]
public class NormalizeWhitespaceBenchmark
{
    [Benchmark(Baseline = true)]
    public string CreateNormalizeToString()
    {
        return GetDummyTree().NormalizeWhitespace().ToFullString();
    }

    [Benchmark]
    public string CreateToString()
    {
        return GetDummyTree().ToFullString();
    }

    private static CompilationUnitSyntax GetDummyTree()
    {
        // // Licensed to the .NET Foundation under one or more agreements.
        // // The .NET Foundation licenses this file to you under the MIT license.
        // // See the LICENSE file in the project root for more information.
        //
        // #pragma warning disable
        //
        // namespace Microsoft.Toolkit.Mvvm.Messaging.__Internals
        // {
        //     [global::System.CodeDom.Compiler.GeneratedCode("...", "...")]
        //     [global::System.Diagnostics.DebuggerNonUserCode]
        //     [global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage]
        //     [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
        //     [global::System.Obsolete("This type is not intended to be used directly by user code")]
        //     internal static partial class __IMessengerExtensions
        //     {
        //         [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
        //         [global::System.Obsolete("This method is not intended to be called directly by user code")]
        //         public static global::System.Action<IMessenger, object> CreateAllMessagesRegistrator(<RECIPIENT_TYPE> _)
        //         {
        //             static void RegisterAll(IMessenger messenger, object obj)
        //             {
        //                 var recipient = (<INSTANCE_TYPE>)obj;
        //                 messenger.Register<<MESSAGE_TYPE>>(recipient);
        //                 messenger.Register<<MESSAGE_TYPE>>(recipient);
        //                 messenger.Register<<MESSAGE_TYPE>>(recipient);
        //             }
        //
        //             return RegisterAll;
        //         }
        //
        //         [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
        //         [global::System.Obsolete("This method is not intended to be called directly by user code")]
        //         public static global::System.Action<IMessenger, object, TToken> CreateAllMessagesRegistratorWithToken<TToken>(<RECIPIENT_TYPE> _)
        //             where TToken : global::System.IEquatable<TToken>
        //         {
        //             static void RegisterAll(IMessenger messenger, object obj, TToken token)
        //             {
        //                 var recipient = (<INSTANCE_TYPE>)obj;
        //                 messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
        //                 messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
        //                 messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
        //             }
        //
        //             return RegisterAll;
        //         }
        //     }
        // }
        return
            CompilationUnit().AddMembers(
                NamespaceDeclaration(IdentifierName("Microsoft.Toolkit.Mvvm.Messaging.__Internals")).WithLeadingTrivia(TriviaList(
                    Comment("// Licensed to the .NET Foundation under one or more agreements."),
                    Comment("// The .NET Foundation licenses this file to you under the MIT license."),
                    Comment("// See the LICENSE file in the project root for more information."),
                    Trivia(PragmaWarningDirectiveTrivia(Token(SyntaxKind.DisableKeyword), true)))).AddMembers(
                ClassDeclaration("__IMessengerExtensions").AddModifiers(
                    Token(SyntaxKind.InternalKeyword),
                    Token(SyntaxKind.StaticKeyword),
                    Token(SyntaxKind.PartialKeyword)).AddAttributeLists(
                    AttributeList(SingletonSeparatedList(
                        Attribute(IdentifierName($"global::System.CodeDom.Compiler.GeneratedCode"))
                        .AddArgumentListArguments(
                            AttributeArgument(LiteralExpression(SyntaxKind.StringLiteralExpression, Literal("Microsoft.Toolkit.Mvvm"))),
                            AttributeArgument(LiteralExpression(SyntaxKind.StringLiteralExpression, Literal("7.0.0.0")))))),
                    AttributeList(SingletonSeparatedList(Attribute(IdentifierName("global::System.Diagnostics.DebuggerNonUserCode")))),
                    AttributeList(SingletonSeparatedList(Attribute(IdentifierName("global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage")))),
                    AttributeList(SingletonSeparatedList(
                        Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
                        AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
                    AttributeList(SingletonSeparatedList(
                        Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
                        AttributeArgument(LiteralExpression(
                            SyntaxKind.StringLiteralExpression,
                            Literal("This type is not intended to be used directly by user code"))))))).AddMembers(
                MethodDeclaration(
                    GenericName("global::System.Action").AddTypeArgumentListArguments(
                        IdentifierName("IMessenger"),
                        PredefinedType(Token(SyntaxKind.ObjectKeyword))),
                    Identifier("CreateAllMessagesRegistrator")).AddAttributeLists(
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
                            AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
                            AttributeArgument(LiteralExpression(
                                SyntaxKind.StringLiteralExpression,
                                Literal("This method is not intended to be called directly by user code"))))))).AddModifiers(
                    Token(SyntaxKind.PublicKeyword),
                    Token(SyntaxKind.StaticKeyword)).AddParameterListParameters(
                        Parameter(Identifier("_")).WithType(IdentifierName("global::MyApp.MyViewModel")))
                    .WithBody(Block(
                        LocalFunctionStatement(
                            PredefinedType(Token(SyntaxKind.VoidKeyword)),
                            Identifier("RegisterAll"))
                        .AddModifiers(Token(SyntaxKind.StaticKeyword))
                        .AddParameterListParameters(
                            Parameter(Identifier("messenger")).WithType(IdentifierName("IMessenger")),
                            Parameter(Identifier("obj")).WithType(PredefinedType(Token(SyntaxKind.ObjectKeyword))))
                        .WithBody(Block(
                            LocalDeclarationStatement(
                                VariableDeclaration(IdentifierName("var"))
                                .AddVariables(
                                    VariableDeclarator(Identifier("recipient"))
                                    .WithInitializer(EqualsValueClause(
                                        CastExpression(
                                            IdentifierName("global::MyApp.MyViewModel"),
                                            IdentifierName("obj")))))))
                            .AddStatements(
                                ExpressionStatement(
                                    InvocationExpression(
                                        MemberAccessExpression(
                                            SyntaxKind.SimpleMemberAccessExpression,
                                            IdentifierName("messenger"),
                                            GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                                IdentifierName("global::MyApp.MessageA"))))
                                    .AddArgumentListArguments(Argument(IdentifierName("recipient")))),
                                ExpressionStatement(
                                    InvocationExpression(
                                        MemberAccessExpression(
                                            SyntaxKind.SimpleMemberAccessExpression,
                                            IdentifierName("messenger"),
                                            GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                                IdentifierName("global::MyApp.MessageA"))))
                                    .AddArgumentListArguments(Argument(IdentifierName("recipient")))),
                                ExpressionStatement(
                                    InvocationExpression(
                                        MemberAccessExpression(
                                            SyntaxKind.SimpleMemberAccessExpression,
                                            IdentifierName("messenger"),
                                            GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                                IdentifierName("global::MyApp.MessageA"))))
                                    .AddArgumentListArguments(Argument(IdentifierName("recipient")))))),
                        ReturnStatement(IdentifierName("RegisterAll")))),
                MethodDeclaration(
                    GenericName("global::System.Action").AddTypeArgumentListArguments(
                        IdentifierName("IMessenger"),
                        PredefinedType(Token(SyntaxKind.ObjectKeyword)),
                        IdentifierName("TToken")),
                    Identifier("CreateAllMessagesRegistratorWithToken")).AddAttributeLists(
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
                            AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
                            AttributeArgument(LiteralExpression(
                                SyntaxKind.StringLiteralExpression,
                                Literal("This method is not intended to be called directly by user code"))))))).AddModifiers(
                    Token(SyntaxKind.PublicKeyword),
                    Token(SyntaxKind.StaticKeyword)).AddParameterListParameters(
                        Parameter(Identifier("_")).WithType(IdentifierName("global::MyApp.MyViewModel")))
                    .AddTypeParameterListParameters(TypeParameter("TToken"))
                    .AddConstraintClauses(
                        TypeParameterConstraintClause("TToken")
                        .AddConstraints(TypeConstraint(GenericName("global::System.IEquatable").AddTypeArgumentListArguments(IdentifierName("TToken")))))
                    .WithBody(Block(
                        LocalFunctionStatement(
                            PredefinedType(Token(SyntaxKind.VoidKeyword)),
                            Identifier("RegisterAll"))
                        .AddModifiers(Token(SyntaxKind.StaticKeyword))
                        .AddParameterListParameters(
                            Parameter(Identifier("messenger")).WithType(IdentifierName("IMessenger")),
                            Parameter(Identifier("obj")).WithType(PredefinedType(Token(SyntaxKind.ObjectKeyword))),
                            Parameter(Identifier("token")).WithType(IdentifierName("TToken")))
                        .WithBody(Block(
                            LocalDeclarationStatement(
                                VariableDeclaration(IdentifierName("var"))
                                .AddVariables(
                                    VariableDeclarator(Identifier("recipient"))
                                    .WithInitializer(EqualsValueClause(
                                        CastExpression(
                                            IdentifierName("global::MyApp.MyViewModel"),
                                            IdentifierName("obj")))))))
                            .AddStatements(
                            ExpressionStatement(
                                InvocationExpression(
                                    MemberAccessExpression(
                                        SyntaxKind.SimpleMemberAccessExpression,
                                        IdentifierName("messenger"),
                                        GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                            IdentifierName("global::MyApp.MessageA"),
                                            IdentifierName("TToken"))))
                                .AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))),
                            ExpressionStatement(
                                InvocationExpression(
                                    MemberAccessExpression(
                                        SyntaxKind.SimpleMemberAccessExpression,
                                        IdentifierName("messenger"),
                                        GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                            IdentifierName("global::MyApp.MessageA"),
                                            IdentifierName("TToken"))))
                                .AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))),
                            ExpressionStatement(
                                InvocationExpression(
                                    MemberAccessExpression(
                                        SyntaxKind.SimpleMemberAccessExpression,
                                        IdentifierName("messenger"),
                                        GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                            IdentifierName("global::MyApp.MessageA"),
                                            IdentifierName("TToken"))))
                                .AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))))),
                        ReturnStatement(IdentifierName("RegisterAll")))))));
    }
}

You can see how calling NormalizeWhitespace alone increases the total CPU time by 5x in this sample benchmark.

Proposal

The simplest API proposal to address this issue would be to add a ToFullStringWithNormalizedWhitespaces API:

 namespace Microsoft.CodeAnalysis
 {
     public abstract class SyntaxNode
     {
+        public virtual string ToFullStringWithNormalizedWhitespaces();
     }
 }

This would produce a string equivalent to calling NormalizeWhitespaces().ToFullString(), but in an optimized manner, ie. without actually duplicating the whole tree and creating new nodes, but just adding the right whitespaces to the output as needed while traversing the tree. This would require very little code changes by consumers currently doing NormalizeWhitespaces().ToFullString(), but would greatly improve the performance of their source generators. I could also see some specific APIs to directly create a SourceText instance from a tree, with normalized whitespaces, which could then also skip creating the string entirely, but considering that CPU time is the biggest problem here and not the actual memory allocations, that seems to be less of a priority from what I can see. Ideally, this new ToFullStringWithNormalizedWhitespaces() API would have a performance as close to just ToFullString() as possible.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Language Design untriaged Issues and PRs which have not yet been triaged by a lead labels Apr 26, 2021
@CyrusNajmabadi
Copy link
Member

The issue is that NormalizeWhitespace is quite expensive

I'd be interested to see if we could just fix that. Then everyone benefits. If not, then maybe specialized apis are needed.

@jaredpar jaredpar added api-needs-work API needs work before it is approved, it is NOT ready for implementation Bug Concept-API This issue involves adding, removing, clarification, or modification of an API. and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Jul 13, 2021
@jaredpar jaredpar added this to the 17.0 milestone Jul 13, 2021
@333fred 333fred modified the milestones: 17.0, 17.1 Sep 2, 2021
@Sergio0694
Copy link
Contributor Author

I see this has been added to the 17.1 milestone but the API still needs some work. If Roslyn proposals work the same as runtime ones I assume that means we should just discuss that here and settle on an API shape that can then go to LDM review?

@CyrusNajmabadi is the blocker for you just the fact that you'd like to first investigate whether NormalizeWhitespace can be sped up enough yet to avoid the need for this API? If so, have you had a chance to look into this so far? Additionally, even if that was possible (certainly making the syntax tree traaversal faster wouldn't hurt in general), do you think that this specific API wouldn't provide enough value on its own regardless, given that it would likely still be faster and more memory efficient due to not having to rebuild and allocate a whole deep copy of the syntax tree being rendered? 🤔

Just trying to get the conversation going here as I would love for this to make it into 17.1, is all 😄

@CyrusNajmabadi
Copy link
Member

is the blocker for you just the fact that you'd like to first investigate whether NormalizeWhitespace can be sped up enough yet to avoid the need for this API

Yes. Though I'm not volunteering for this :-)

@sharwell
Copy link
Member

... but would greatly improve the performance of their source generators ...

I haven't seen much evidence that this is true once you combine it with:

  • Proper incrementality
  • Evaluation relative to other sources of overhead

In other words, while NormalizeWhitespace has been measurably significant at times, I've found that a source generator deemed observably slow by users achieved that characteristic through other means. Even in the case of CsWin32 (the source generator with the most pronounced impact from NormalizeWhitespace), it would not have been fixed by this proposal and it has already mitigated the problem through other means.

@jcouv jcouv modified the milestones: 17.1, 17.2 Mar 17, 2022
@jcouv jcouv modified the milestones: 17.2, 17.3 May 14, 2022
@jaredpar jaredpar modified the milestones: 17.3, C# 12.0 Jul 1, 2022
@CyrusNajmabadi CyrusNajmabadi added the Need More Info The issue needs more information to proceed. label Nov 2, 2022
@ghost
Copy link

ghost commented Nov 12, 2022

Closing this issue as we've seen no reply to the request for more information. If you are able to get the requested information, please add it to the issue and we will retriage it.

@ghost ghost closed this as completed Nov 12, 2022
@Sergio0694
Copy link
Contributor Author

Uh... This feels like it wasn't resolved yet, should we reopen it? 😅

@333fred 333fred reopened this Nov 12, 2022
@333fred
Copy link
Member

333fred commented Nov 12, 2022

@CyrusNajmabadi can you clarify what more info you were asking for?

@ghost ghost added untriaged Issues and PRs which have not yet been triaged by a lead and removed Need More Info The issue needs more information to proceed. labels Nov 16, 2022
@CyrusNajmabadi
Copy link
Member

would automatically call

It is absolutely not in the table. C# is definitely whitespace sensitive, and doing this could def break code. We should not automatically do anything. Of you want normalized code, and that's ok for your domain, then just normalize it first.

@Eli-Black-Work
Copy link

I'm referencing #49685 (comment)

@AArnott Would you mind weighing in on this? 🙂

@CyrusNajmabadi
Copy link
Member

I have no problem with people making normalize whitespace fast. Seems like it people need better performance there then we should just get some prs out to improve it, versus creating whole new apis.

As I've mentioned in the past, normalize was never something anyone spent time optimizing. So it's likely it can just be sped up if someone was interested in looking into that.

@AArnott
Copy link
Contributor

AArnott commented Nov 16, 2022

I disagree with @sharwell that CsWin32's whitespace problem or perf problems are solved. We avoid NormalizeWhitespace completely now because it is unacceptably slow. Instead, we have our own FastSyntaxFactory class and use the Banned API analyzer to forbid use of SyntaxFactor and a bunch of overloads on various syntax classes because they use elastic trivia which kicks in some automatic normalizing (I guess... my memory is fuzzy on this as we did it long ago). In addition to trying to get some reasonable whitespace (e.g. a space after a type name so it doesn't 'collide' with the next node or token) during generation itself, we have our custom whitespace 'visitor' class that fixes indentation and some other whitespace concerns.
The result is ugly, but readable. It also isn't fast. It's much faster than NormalizeWhitespace, but I just fixed a perf bug this morning where we spent 20 seconds adding space between members of a type: we had a loop where we changed a SyntaxList within the loop instead of converting it to a mutable List first and then recreating the SyntaxList at the end.

The lengths we go to in order to have acceptable perf is far in excess of what I would consider reasonable, and the resulting generated text is sub-par at best.

I don't care if it's a new API or fixing an old one, but we should not have to go to lengths to get good perf.

To Sam's comment about incremental SGs, csWin32 doesn't yet use any incremental APIs. CsWin32 is fundamentally a non-incremental thing for the most part because one character typed in any code file could potentially change generated code. I expect we can find specific syntax sub-trees that can be cached and reused, but it will be very expensive and we haven't prioritized that yet, since our UX is ok for now.
My primary concern about perf right now is for our automated testing (which resembles real build time for customers), where just one run of our 'full generation' test is now taking 12 minutes on a really fast machine. We have to run it under 12 different configurations. So needless to say, I'm very interested in improving perf in the non-incremental case.

@CyrusNajmabadi
Copy link
Member

@AArnott i agree with thsi:

The lengths we go to in order to have acceptable perf is far in excess of what I would consider reasonable

Currently, teh generation part of SG has zero funding. I've proposed options in the past about providing improvements and APIs here, but we haven't been able to fit them into the schedule at all because of all the other pressing matters. Generally speaking, i think this is because SGs are considered a highly advanced scenario, for niche areas. So the feeling is that the people using it are advanced enough to roll solutions, without us having to provide that ourselves. I'll push again for us to get some budget to improve things here. But i'm not certain anything will have changed in this area.

@jaredpar jaredpar removed the untriaged Issues and PRs which have not yet been triaged by a lead label Nov 17, 2022
@jaredpar jaredpar modified the milestones: C# 12.0, Backlog Sep 12, 2023
@jeff-simeon
Copy link

jeff-simeon commented Sep 24, 2023

Would love to see a solution to this. 40% of our build time is spent on calls to NormalizeWhitespace (full disclosure - we are generating thousands of classes at build time, using incremental source generators).

Briefly playing with removing calls to NormalizeWhitespace and using trivia manually, then extrapolating the gains, we could see a reduction in build time from 3 minutes (on an MacBook M1 Max) to just over 1.5 minutes. Reduced memory usage was also observed (1GB vs 2GB). Unfortunately, manually generating trivia in EVERY syntax tree would be a huge pain and I'm not prepared to introduce all of that extra code to the generation process.

@CyrusNajmabadi
Copy link
Member

@jeff-simeon can you jsut emit the strings directly? no point in SGs in using nodes, as we're going to have to just parse things anyways.

@jeff-simeon
Copy link

jeff-simeon commented Sep 24, 2023

I suppose we could - though it would make the code a lot harder to read, and it's relatively complex already.

@Sergio0694
Copy link
Contributor Author

This was also briefly mentioned on Discord as well — in many scenarios, using syntax nodes makes the code for generators much easier to understand, follow and maintain, especially when the generators get more complicated. This applies to several generators in the .NET SDK as well (cc. @jkoritzinsky).

Imho it'd be nice to have a more optimized version of this, on second thought a WriteToWithNormalizedWhitespace would be even better, actually (or maybe both, since once uses the other). The entire tree duplication that NormalizeWhitespace does is just completely unnecessary work.

@jeff-simeon
Copy link

@Sergio0694 - I agree and this is exactly the scenario we have - which is probably why @AArnott wrote a FastSyntaxFactory which wraps a StringBuilder and avoids the nodes.

@CyrusNajmabadi
Copy link
Member

That seems reasonable. Can you just use that factory instead?

@jeff-simeon
Copy link

jeff-simeon commented Sep 24, 2023

@CyrusNajmabadi - I'm not sure what you're talking about. I'm referring to the previous post by @AArnott, where he mentions something that his team built. I do not have access to this and searching for it on the internet yields no results. I'm merely speculating as to what he did to work around the problem.

@CyrusNajmabadi
Copy link
Member

though it would make the code a lot harder to read, and it's relatively complex already.

I don't think ease of readability is a core goal of SGs. Basically, every time this has come up on the team (and i've brought it up a few times), the thinking is that this is all for computers to do, and writing fast string-writing code is acceptable.

If someone wants to make their own api to help here, they can, but it's not an investment area. I proposed options in the past to make things better, but it's all be rejected as there is no capacity on the team to take on new APIs to help out effectively expert scenarios.

@CyrusNajmabadi
Copy link
Member

I do not have access to this and searching for it on the internet yields no results. I'm merely speculating as to what he did to work around the problem.

It's on github:

https://github.com/microsoft/CsWin32/blob/dd815b2b9b4a16f9411baa1f2fa81863c0725fca/src/Microsoft.Windows.CsWin32/FastSyntaxFactory.cs#L13

You can fork/copy this. Note: this doesn't write to strings, it just avoids elastic trivia, which can be expensive if not needed.

@CyrusNajmabadi
Copy link
Member

Imho it'd be nice to have a more optimized version of this,

@Sergio0694 Nothing is stopping teams (like .net) that really want this from writing such a thing. Trees are just a simple data structure. And if it's hyper critical that trees be the form you use and you don't want to use trivia and you want to write it out to a string quickly, then writing your own abstraction to do that seems totally viable.

The cost assessment in the past was that this wasn't an area we were willing to invest in. It's hyperspecialized just to this narrow domain, and writing out strings is just something individual libraries can do themselves, or utilize a library that someone wants to invest in.

@AArnott
Copy link
Contributor

AArnott commented Sep 24, 2023

@Sergio0694 - I agree and this is exactly the scenario we have - which is probably why @AArnott wrote a FastSyntaxFactory which wraps a StringBuilder and avoids the nodes.

My FastSyntaxFactory doesn't do that at all. It merely avoids elastic trivia (IIRC) by explicitly specifying all tokens when forwarding all calls to the regular SyntaxFactory.

@jeff-simeon
Copy link

though it would make the code a lot harder to read, and it's relatively complex already.

I don't think ease of readability is a core goal of SGs. Basically, every time this has come up on the team (and i've brought it up a few times), the thinking is that this is all for computers to do, and writing fast string-writing code is acceptable.

If someone wants to make their own api to help here, they can, but it's not an investment area. I proposed options in the past to make things better, but it's all be rejected as there is no capacity on the team to take on new APIs to help out effectively expert scenarios.

Ok, thanks for the reply, but it's a bummer :(

We will have to build our own syntax node model and wrap a StringBuilder at some point.

@CyrusNajmabadi
Copy link
Member

I would avoid a syntax node model altogether. It's an unnecessary intermediary abstraction (and often expensive in memory to boot).

@jeff-simeon
Copy link

Using a StringBuilder directly would be a maintainability problem. We'll have to build our own model that wraps a StringBuilder....I suppose it will be just like a SyntaxNode model.......but have a fast ToString method on it.

At the end of the day, all abstractions are unnecessary and come at some performance cost. The complexity of what we're doing requires some abstraction from direct string manipulation. That will come at some performance cost. But preferably, not a cost that occupies 40% of our build time.

@CyrusNajmabadi
Copy link
Member

Using a StringBuilder directly would be a maintainability problem.

I would be very surprised by that. Writing out code is usually very simple. Here's an example of a helper type i whipped up (similar to lots of stuff we've done before) to make it super easy: https://gist.github.com/CyrusNajmabadi/edf9f6059d8fd2ced7a2086a8835be1d

Just write to this, and tell it when you're increasing or decreasing indentation (and def write helpers/extensions to do it for common cases, like blocks). And it takes care of the rest.

We'll have to build our own model that wraps a StringBuilder

Sure. But it basically does nothing but pass through the data, and just handle indentation so that everything is not left-flushed. Almost no work/overhead necessary at runtime, and very simple to use on the consumer side.

At the end of the day, all abstractions are unnecessary and come at some performance cost.

Sure. But syntax-nodes are a huge abstraction cost. Like huge. IIRC, they're around 5 times the size of the strings they represent. And they require substantial manipulation to get them formatted.

They're really good for dealing with user code that needs to be understood/manipulated. They're really not necessary for a space where the end goal is generating code.

@Sergio0694
Copy link
Contributor Author

Changed the benchmark in this issue to use a custom indented string writer, here's the benchmark:

Method Mean Error StdDev Ratio Gen0 Gen1 Gen2 Allocated Alloc Ratio
SyntaxTrees 586.810 us 10.3462 us 9.6778 us 1.000 30.2734 7.8125 0.9766 183.17 KB 1.00
IndentedWriter 1.268 us 0.0151 us 0.0134 us 0.002 1.1959 - - 5.05 KB 0.03

I think @CyrusNajmabadi has successfully converted me over to the "just write to an indented writer" gang 😄

Benchmark code (click to expand):
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using ComputeSharp.SourceGeneration.Helpers;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory;

BenchmarkRunner.Run<NormalizeWhitespaceBenchmark>();

[MemoryDiagnoser]
public class NormalizeWhitespaceBenchmark
{
    private readonly string typeName = typeof(NormalizeWhitespaceBenchmark).FullName!;
    private readonly string typeVersion = typeof(NormalizeWhitespaceBenchmark).Assembly.GetName().Version!.ToString();
    private readonly string recipientType = "global::MyViewModel";
    private readonly string messageTypeA = "global::MessageA";
    private readonly string messageTypeB = "global::MessageB";
    private readonly string messageTypeC = "global::MessageC";

    [Benchmark]
    public string IndentedWriter()
    {
        using IndentedTextWriter writer = IndentedTextWriter.Rent();

        writer.WriteLine("// Licensed to the .NET Foundation under one or more agreements.");
        writer.WriteLine("// The .NET Foundation licenses this file to you under the MIT license.");
        writer.WriteLine("// See the LICENSE file in the project root for more information.");
        writer.WriteLine("#pragma warning disable");
        writer.WriteLine("namespace Microsoft.Toolkit.Mvvm.Messaging.__Internals");

        using (writer.WriteBlock())
        {
            writer.WriteLine($"""[global::System.CodeDom.Compiler.GeneratedCode("{this.typeName}", "{this.typeVersion}")]""");
            writer.WriteLine($"""[global::System.Diagnostics.DebuggerNonUserCode]""");
            writer.WriteLine($"""[global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage]""");
            writer.WriteLine($"""[global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]""");
            writer.WriteLine($"""[global::System.Obsolete("This type is not intended to be used directly by user code")]""");
            writer.WriteLine($"""internal static partial class __IMessengerExtensions""");

            using (writer.WriteBlock())
            {
                writer.WriteLine($"""[global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]""");
                writer.WriteLine($"""[global::System.Obsolete("This type is not intended to be used directly by user code")]""");
                writer.WriteLine($"""public static global::System.Action<IMessenger, object> CreateAllMessagesRegistrator({this.recipientType} _)""");

                using (writer.WriteBlock())
                {
                    writer.WriteLine($"""static void RegisterAll(IMessenger messenger, object obj)""");

                    using (writer.WriteBlock())
                    {
                        writer.WriteLine($"""var recipient = ({this.recipientType})obj;""");
                        writer.WriteLine($"""messenger.Register<{this.messageTypeA}>(recipient);""");
                        writer.WriteLine($"""messenger.Register<{this.messageTypeB}>(recipient);""");
                        writer.WriteLine($"""messenger.Register<{this.messageTypeC}>(recipient);""");
                    }

                    writer.WriteLine();
                    writer.WriteLine("return RegisterAll;");
                }

                writer.WriteLine();
                writer.WriteLine($"""[global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]""");
                writer.WriteLine($"""[global::System.Obsolete("This type is not intended to be used directly by user code")]""");
                writer.WriteLine($"""public static global::System.Action<IMessenger, object, TToken> CreateAllMessagesRegistratorWithToken<TToken>({this.recipientType} _)""");
                writer.WriteLine($"""    where TToken : global::System.IEquatable<TToken>""");

                using (writer.WriteBlock())
                {
                    writer.WriteLine("static void RegisterAll(IMessenger messenger, object obj, TToken token)");

                    using (writer.WriteBlock())
                    {
                        writer.WriteLine($"""var recipient = ({this.recipientType})obj;""");
                        writer.WriteLine($"""messenger.Register<{this.messageTypeA}, TToken>(recipient, token);""");
                        writer.WriteLine($"""messenger.Register<{this.messageTypeB}, TToken>(recipient, token);""");
                        writer.WriteLine($"""messenger.Register<{this.messageTypeC}, TToken>(recipient, token);""");
                    }

                    writer.WriteLine();
                    writer.WriteLine("return RegisterAll;");
                }
            }
        }

        return writer.ToString();
    }

    [Benchmark(Baseline = true)]
    public string SyntaxTrees()
    {
        return GetDummyTree().NormalizeWhitespace(eol: "\n").ToFullString();
    }

    private CompilationUnitSyntax GetDummyTree()
    {
        // // Licensed to the .NET Foundation under one or more agreements.
        // // The .NET Foundation licenses this file to you under the MIT license.
        // // See the LICENSE file in the project root for more information.
        //
        // #pragma warning disable
        //
        // namespace Microsoft.Toolkit.Mvvm.Messaging.__Internals
        // {
        //     [global::System.CodeDom.Compiler.GeneratedCode("...", "...")]
        //     [global::System.Diagnostics.DebuggerNonUserCode]
        //     [global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage]
        //     [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
        //     [global::System.Obsolete("This type is not intended to be used directly by user code")]
        //     internal static partial class __IMessengerExtensions
        //     {
        //         [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
        //         [global::System.Obsolete("This method is not intended to be called directly by user code")]
        //         public static global::System.Action<IMessenger, object> CreateAllMessagesRegistrator(<RECIPIENT_TYPE> _)
        //         {
        //             static void RegisterAll(IMessenger messenger, object obj)
        //             {
        //                 var recipient = (<RECIPIENT_TYPE>)obj;
        //                 messenger.Register<<MESSAGE_TYPE>>(recipient);
        //                 messenger.Register<<MESSAGE_TYPE>>(recipient);
        //                 messenger.Register<<MESSAGE_TYPE>>(recipient);
        //             }
        //
        //             return RegisterAll;
        //         }
        //
        //         [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
        //         [global::System.Obsolete("This method is not intended to be called directly by user code")]
        //         public static global::System.Action<IMessenger, object, TToken> CreateAllMessagesRegistratorWithToken<TToken>(<RECIPIENT_TYPE> _)
        //             where TToken : global::System.IEquatable<TToken>
        //         {
        //             static void RegisterAll(IMessenger messenger, object obj, TToken token)
        //             {
        //                 var recipient = (<RECIPIENT_TYPE>)obj;
        //                 messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
        //                 messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
        //                 messenger.Register<<MESSAGE_TYPE>, TToken>(recipient, token);
        //             }
        //
        //             return RegisterAll;
        //         }
        //     }
        // }
        return
            CompilationUnit().AddMembers(
                NamespaceDeclaration(IdentifierName("Microsoft.Toolkit.Mvvm.Messaging.__Internals")).WithLeadingTrivia(TriviaList(
                    Comment("// Licensed to the .NET Foundation under one or more agreements."),
                    Comment("// The .NET Foundation licenses this file to you under the MIT license."),
                    Comment("// See the LICENSE file in the project root for more information."),
                    Trivia(PragmaWarningDirectiveTrivia(Token(SyntaxKind.DisableKeyword), true)))).AddMembers(
                ClassDeclaration("__IMessengerExtensions").AddModifiers(
                    Token(SyntaxKind.InternalKeyword),
                    Token(SyntaxKind.StaticKeyword),
                    Token(SyntaxKind.PartialKeyword)).AddAttributeLists(
                    AttributeList(SingletonSeparatedList(
                        Attribute(IdentifierName($"global::System.CodeDom.Compiler.GeneratedCode"))
                        .AddArgumentListArguments(
                            AttributeArgument(LiteralExpression(SyntaxKind.StringLiteralExpression, Literal(this.typeName))),
                            AttributeArgument(LiteralExpression(SyntaxKind.StringLiteralExpression, Literal(this.typeVersion)))))),
                    AttributeList(SingletonSeparatedList(Attribute(IdentifierName("global::System.Diagnostics.DebuggerNonUserCode")))),
                    AttributeList(SingletonSeparatedList(Attribute(IdentifierName("global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage")))),
                    AttributeList(SingletonSeparatedList(
                        Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
                        AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
                    AttributeList(SingletonSeparatedList(
                        Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
                        AttributeArgument(LiteralExpression(
                            SyntaxKind.StringLiteralExpression,
                            Literal("This type is not intended to be used directly by user code"))))))).AddMembers(
                MethodDeclaration(
                    GenericName("global::System.Action").AddTypeArgumentListArguments(
                        IdentifierName("IMessenger"),
                        PredefinedType(Token(SyntaxKind.ObjectKeyword))),
                    Identifier("CreateAllMessagesRegistrator")).AddAttributeLists(
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
                            AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
                            AttributeArgument(LiteralExpression(
                                SyntaxKind.StringLiteralExpression,
                                Literal("This method is not intended to be called directly by user code"))))))).AddModifiers(
                    Token(SyntaxKind.PublicKeyword),
                    Token(SyntaxKind.StaticKeyword)).AddParameterListParameters(
                        Parameter(Identifier("_")).WithType(IdentifierName(this.recipientType)))
                    .WithBody(Block(
                        LocalFunctionStatement(
                            PredefinedType(Token(SyntaxKind.VoidKeyword)),
                            Identifier("RegisterAll"))
                        .AddModifiers(Token(SyntaxKind.StaticKeyword))
                        .AddParameterListParameters(
                            Parameter(Identifier("messenger")).WithType(IdentifierName("IMessenger")),
                            Parameter(Identifier("obj")).WithType(PredefinedType(Token(SyntaxKind.ObjectKeyword))))
                        .WithBody(Block(
                            LocalDeclarationStatement(
                                VariableDeclaration(IdentifierName("var"))
                                .AddVariables(
                                    VariableDeclarator(Identifier("recipient"))
                                    .WithInitializer(EqualsValueClause(
                                        CastExpression(
                                            IdentifierName(this.recipientType),
                                            IdentifierName("obj")))))))
                            .AddStatements(
                                ExpressionStatement(
                                    InvocationExpression(
                                        MemberAccessExpression(
                                            SyntaxKind.SimpleMemberAccessExpression,
                                            IdentifierName("messenger"),
                                            GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                                IdentifierName(this.messageTypeA))))
                                    .AddArgumentListArguments(Argument(IdentifierName("recipient")))),
                                ExpressionStatement(
                                    InvocationExpression(
                                        MemberAccessExpression(
                                            SyntaxKind.SimpleMemberAccessExpression,
                                            IdentifierName("messenger"),
                                            GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                                IdentifierName(this.messageTypeB))))
                                    .AddArgumentListArguments(Argument(IdentifierName("recipient")))),
                                ExpressionStatement(
                                    InvocationExpression(
                                        MemberAccessExpression(
                                            SyntaxKind.SimpleMemberAccessExpression,
                                            IdentifierName("messenger"),
                                            GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                                IdentifierName(this.messageTypeC))))
                                    .AddArgumentListArguments(Argument(IdentifierName("recipient")))))),
                        ReturnStatement(IdentifierName("RegisterAll")))),
                MethodDeclaration(
                    GenericName("global::System.Action").AddTypeArgumentListArguments(
                        IdentifierName("IMessenger"),
                        PredefinedType(Token(SyntaxKind.ObjectKeyword)),
                        IdentifierName("TToken")),
                    Identifier("CreateAllMessagesRegistratorWithToken")).AddAttributeLists(
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.ComponentModel.EditorBrowsable")).AddArgumentListArguments(
                            AttributeArgument(ParseExpression("global::System.ComponentModel.EditorBrowsableState.Never"))))),
                        AttributeList(SingletonSeparatedList(
                            Attribute(IdentifierName("global::System.Obsolete")).AddArgumentListArguments(
                            AttributeArgument(LiteralExpression(
                                SyntaxKind.StringLiteralExpression,
                                Literal("This method is not intended to be called directly by user code"))))))).AddModifiers(
                    Token(SyntaxKind.PublicKeyword),
                    Token(SyntaxKind.StaticKeyword)).AddParameterListParameters(
                        Parameter(Identifier("_")).WithType(IdentifierName(this.recipientType)))
                    .AddTypeParameterListParameters(TypeParameter("TToken"))
                    .AddConstraintClauses(
                        TypeParameterConstraintClause("TToken")
                        .AddConstraints(TypeConstraint(GenericName("global::System.IEquatable").AddTypeArgumentListArguments(IdentifierName("TToken")))))
                    .WithBody(Block(
                        LocalFunctionStatement(
                            PredefinedType(Token(SyntaxKind.VoidKeyword)),
                            Identifier("RegisterAll"))
                        .AddModifiers(Token(SyntaxKind.StaticKeyword))
                        .AddParameterListParameters(
                            Parameter(Identifier("messenger")).WithType(IdentifierName("IMessenger")),
                            Parameter(Identifier("obj")).WithType(PredefinedType(Token(SyntaxKind.ObjectKeyword))),
                            Parameter(Identifier("token")).WithType(IdentifierName("TToken")))
                        .WithBody(Block(
                            LocalDeclarationStatement(
                                VariableDeclaration(IdentifierName("var"))
                                .AddVariables(
                                    VariableDeclarator(Identifier("recipient"))
                                    .WithInitializer(EqualsValueClause(
                                        CastExpression(
                                            IdentifierName(this.recipientType),
                                            IdentifierName("obj")))))))
                            .AddStatements(
                            ExpressionStatement(
                                InvocationExpression(
                                    MemberAccessExpression(
                                        SyntaxKind.SimpleMemberAccessExpression,
                                        IdentifierName("messenger"),
                                        GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                            IdentifierName(this.messageTypeA),
                                            IdentifierName("TToken"))))
                                .AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))),
                            ExpressionStatement(
                                InvocationExpression(
                                    MemberAccessExpression(
                                        SyntaxKind.SimpleMemberAccessExpression,
                                        IdentifierName("messenger"),
                                        GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                            IdentifierName(this.messageTypeB),
                                            IdentifierName("TToken"))))
                                .AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))),
                            ExpressionStatement(
                                InvocationExpression(
                                    MemberAccessExpression(
                                        SyntaxKind.SimpleMemberAccessExpression,
                                        IdentifierName("messenger"),
                                        GenericName(Identifier("Register")).AddTypeArgumentListArguments(
                                            IdentifierName(this.messageTypeC),
                                            IdentifierName("TToken"))))
                                .AddArgumentListArguments(Argument(IdentifierName("recipient")), Argument(IdentifierName("token")))))),
                        ReturnStatement(IdentifierName("RegisterAll")))))));
    }
}

@CyrusNajmabadi
Copy link
Member

@Sergio0694 that matches my expectations. Using syntax trees at all is just hugely unnecessary overhead. They're gargantuan in memory, and involve an enormous amount of pointer chasing.

They're also just not a great intermediary construct for the domain of just wanting to write out a linearized string.

They are great if you want to actually operate on them in a tree-structured fashion. But I don't think I've ever seen a generator that actually does that. Rather, all generators just write the content start to finish. Or they make subsections, but then concatenate then linearly. All these cases are much better served with just writing a string.

Personally, I think writing the string is much easier and clearer too, as long as you have the right writer-abstraction (like the one I linked to).

@AArnott
Copy link
Contributor

AArnott commented Sep 27, 2023

That is indeed compelling data.

FWIW, CsWin32 does emit node trees, and we modify them along the way. It would be a total rewrite (of a huge codebase) to switch to StringBuilder both because all our SyntaxFactory callers would have to change and because we'd essentially have to decide up front everything we need to do before starting to emit code because we wouldn't be able to emit nodes and modify them later as we do now.
It might make a reasonable project to tackle by combining it with supporting incremental source generation, since doing that properly also needs a fairly substantial refactoring so that we can truly track all the inputs into the source generator.

@CyrusNajmabadi
Copy link
Member

ecause we wouldn't be able to emit nodes and modify them later as we do now.

You can def do this.

Note: as i've mentioned in discord to several people, a very reasonable thing to do is create the indented string chucks for those other nodes up front (and then cache them). Later, when building the full text, it is trivial to add an existing indented node to the final indented writer you are using.

--

I cannot emphasize enough how much of a better approach to SG writing this is. Syntax and trees are a bad abstraction here. They are enormously expensive, unweildy, and designed to be good for very different scenarios.

Personally, i've never once found an SG case that benefits from them. Contract that with string writing. It's blazing fast. Extremely simple. Very easy to compose. And trivial to get the formatting you want within a line, and the indentation we expect for normal C# or VB.

As we can see from the numbers, you should expect roughly two orders of magnitude improvement for CPU and memory, and all using an absolutely trivial type that is super easy to use.

SGs should do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation Area-Compilers Bug Concept-API This issue involves adding, removing, clarification, or modification of an API.
Projects
None yet
Development

No branches or pull requests

9 participants