Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve formatting of System.Text.Json source generated output #79534

Closed
layomia opened this issue Dec 12, 2022 · 13 comments
Closed

Improve formatting of System.Text.Json source generated output #79534

layomia opened this issue Dec 12, 2022 · 13 comments
Labels
area-System.Text.Json enhancement Product code improvement that does NOT require public API changes/additions source-generator Indicates an issue with a source generator feature
Milestone

Comments

@layomia
Copy link
Contributor

layomia commented Dec 12, 2022

The generated output isn't perfectly formatted .

The upcoming ConfigurationBinder source generator (#44493) will introduce a new source-code writer utility API with correct formatting. A follow-up PR should update the System.Text.Json source generator implementation to use it as well.

@layomia layomia added area-System.Text.Json source-generator Indicates an issue with a source generator feature labels Dec 12, 2022
@layomia layomia added this to the Future milestone Dec 12, 2022
@ghost
Copy link

ghost commented Dec 12, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

The generated output isn't perfectly formatted .

The upcoming ConfigurationBinder source generator will introduce a new source-code writer utility API with correct formatting. A follow-up PR should update the System.Text.Json source generator implementation to use it as well.

Author: layomia
Assignees: -
Labels:

area-System.Text.Json, source-generator

Milestone: Future

@Sergio0694
Copy link
Contributor

Is using Roslyn syntax nodes to produce syntax out of the question? That would give perfect formatting "for free" without the need for any custom helpers or anything. It can also make the code easier to maintain given that instead of just passing strings around, you can clearly divide the logic to produce various kinds of syntax nodes, hence making the semantics explicit. For reference, this is the same approach that the [LibraryImport] generator uses, as well as the generators in the MVVM Toolkit and ComputeSharp (and so do a few generators I implemented internally for the Microsoft Store, which aren't open source just yet). Just a thought 🙂

@layomia
Copy link
Contributor Author

layomia commented Dec 12, 2022

Thanks, I'll look into that.

@eiriktsarpalis eiriktsarpalis added the enhancement Product code improvement that does NOT require public API changes/additions label Dec 12, 2022
@Sergio0694
Copy link
Contributor

Happy to help as well if you need! I'm sure @jkoritzinsky would as well 😄

If you're curious to see some examples, here's one from the MVVM Toolkit. You can see how the code is producing syntax nodes for all the generated code, which can then be auto-formatted by Roslyn by just calling NormalizeWhitespace() at the end. The very useful thing is you only ever work with proper nodes this way (eg. you pass around eg. StatementSyntax nodes, or ExpressionSyntax nodes, etc.) which means you never have to worry about things such as manually closing brackets or indenting, instead you work with the higher-level syntax abstraction and let Roslyn do the rest of the work for you. It also makes code more maintainable as you only ever pass objects around, never raw string-s 🙂

@stephentoub
Copy link
Member

stephentoub commented Dec 12, 2022

It also makes code more maintainable as you only ever pass objects around, never raw string-s 🙂

FWIW, I had the exact opposite experience with the regex generator, where the generator was significantly easier to write/read/maintain just using strings. I initially used syntax nodes and it was a significant pain point that I later ripped out completely in favor of just using strings.

@Sergio0694
Copy link
Contributor

@stephentoub If it's not too off topic for this issue, could you elaborate a bit more on that? Specifically, I'm curious whether that was caused by maybe trying to be too strict in using only syntax node APIs? One thing that can be done in scenarios where using syntax nodes is very verbose (say, if you're generating a ton of statements, for instance) is to find a "compromise" between string builders and syntax nodes. This is something that @CyrusNajmabadi also suggested (ie. "choose your own granularity"): instead of necessarily going all the way down to producing each individual node/token, you can eg. use ParseStatement or ParseExpression to much easily generate eg. a bunch of StatementSyntax nodes. That allows you to both create them in almost the same way as you'd have generated them with a literal, while still ending up with an actual object you can pass around to preserve semantics. I do something on the same lines in my generators as well: I never construct fully qualified type names manually (which would be insanely verbose), instead I just use a single IdentifierName with the fully qualified name as a string, which keeps the code compact while still giving me back a TypeSyntax node I can use. I'm sure there's other examples 🙂

Point is: I wonder whether cases where syntax nodes are too verbose aren't just the result of one trying to follow an approach that's too strict, and whether relaxing that a bit couldn't offer a good in-between that gives you a win-win in terms of both verbosity and maintainability. I do agree that going too much all-in with syntax nodes can be way too verbose otherwise. Just thinking out loud here, I'll admit I haven't looked at the regex generator too much in detail myself.

@stephentoub
Copy link
Member

stephentoub commented Dec 12, 2022

Simply put, there was practically zero-observed benefit to using them, and the code with strings was easier to read, understand, and write. The nodes provided a tad bit of help on formatting, but most of that was easily achieved with a few small helpers that then also allowed the structure of the generator to mirror the structure of the output code, e.g.

using (EmitBlock(writer, $"if (pos == {startingPos})"))
{
writer.WriteLine($"{sawEmpty} = 1; // true");
}

The strong typing was a maintenance hinderence, especially as the code evolved, and required a PhD in an object model that's not necessary to understand at all when using strings.

Using the object model makes complete sense in other situations. But purely for output in this case, in my experience it was a net negative. I "chose my own granularity".

say, if you're generating a ton of statements, for instance

The bulk of what the regex generator emits is the implementation of two (often very long) methods. It's "a ton of statements".

@stephentoub
Copy link
Member

And never once with strings did I feel a sentiment like this:
#69608 (comment)
"The rest are really about simplifying code. This gets really complicated fast because it involves going back through the Roslyn syntax trees and updating them."

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Dec 12, 2022

I suspect that a Regex source generator will have different requirements compared to one that needs to generate traversal of arbitrary type graphs. I think certain forms of strong typing might be beneficial for the latter case -- a substantial number of reported STJ sourcegen bugs relate to emergent composition patterns resulting in accidental variable capture or violation of generic parameter constraints.

Apropos, some of the overheads of metaprogramming using ASTs can be eliminated by introducing code quotations at the language level, in the style of Lisp, Scala and F# (and C# to an extent, for the case of IQueryable expressions). Metaprograms using code quotations will always produce well-typed programs if the metaprograms themselves are well-typed.

@stephentoub
Copy link
Member

stephentoub commented Dec 12, 2022

It's certainly possible it would help in the json generator. It's also possible it would hurt (likely, in my experience). I'm simply highlighting that the assertion that it makes code more maintainable is far from a universal truth. Thanks.

@CyrusNajmabadi
Copy link
Member

It's also possibly to just hybrid this. Easily build content with strings, then trivially format the result and pass teh formatted result to roslyn.

@layomia
Copy link
Contributor Author

layomia commented Mar 21, 2023

There was a related discussion in #83614 (comment). TLDR is that the configuration binder introduces a source writer that can handle multi-line literals in addition to single lines. It can be moved to a shared location if other generators want an API to write multi-line code blocks. AFAIK the other generators use writers that can emit code only one line at a time.

@eiriktsarpalis
Copy link
Member

Addressed by #86526

@ghost ghost locked as resolved and limited conversation to collaborators Jul 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Json enhancement Product code improvement that does NOT require public API changes/additions source-generator Indicates an issue with a source generator feature
Projects
None yet
Development

No branches or pull requests

5 participants