-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve formatting of System.Text.Json source generated output #79534
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsThe generated output isn't perfectly formatted . The upcoming
|
Is using Roslyn syntax nodes to produce syntax out of the question? That would give perfect formatting "for free" without the need for any custom helpers or anything. It can also make the code easier to maintain given that instead of just passing strings around, you can clearly divide the logic to produce various kinds of syntax nodes, hence making the semantics explicit. For reference, this is the same approach that the |
Thanks, I'll look into that. |
Happy to help as well if you need! I'm sure @jkoritzinsky would as well 😄 If you're curious to see some examples, here's one from the MVVM Toolkit. You can see how the code is producing syntax nodes for all the generated code, which can then be auto-formatted by Roslyn by just calling |
FWIW, I had the exact opposite experience with the regex generator, where the generator was significantly easier to write/read/maintain just using strings. I initially used syntax nodes and it was a significant pain point that I later ripped out completely in favor of just using strings. |
@stephentoub If it's not too off topic for this issue, could you elaborate a bit more on that? Specifically, I'm curious whether that was caused by maybe trying to be too strict in using only syntax node APIs? One thing that can be done in scenarios where using syntax nodes is very verbose (say, if you're generating a ton of statements, for instance) is to find a "compromise" between string builders and syntax nodes. This is something that @CyrusNajmabadi also suggested (ie. "choose your own granularity"): instead of necessarily going all the way down to producing each individual node/token, you can eg. use Point is: I wonder whether cases where syntax nodes are too verbose aren't just the result of one trying to follow an approach that's too strict, and whether relaxing that a bit couldn't offer a good in-between that gives you a win-win in terms of both verbosity and maintainability. I do agree that going too much all-in with syntax nodes can be way too verbose otherwise. Just thinking out loud here, I'll admit I haven't looked at the regex generator too much in detail myself. |
Simply put, there was practically zero-observed benefit to using them, and the code with strings was easier to read, understand, and write. The nodes provided a tad bit of help on formatting, but most of that was easily achieved with a few small helpers that then also allowed the structure of the generator to mirror the structure of the output code, e.g. runtime/src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Lines 3348 to 3351 in fda0b35
The strong typing was a maintenance hinderence, especially as the code evolved, and required a PhD in an object model that's not necessary to understand at all when using strings. Using the object model makes complete sense in other situations. But purely for output in this case, in my experience it was a net negative. I "chose my own granularity".
The bulk of what the regex generator emits is the implementation of two (often very long) methods. It's "a ton of statements". |
And never once with strings did I feel a sentiment like this: |
I suspect that a Regex source generator will have different requirements compared to one that needs to generate traversal of arbitrary type graphs. I think certain forms of strong typing might be beneficial for the latter case -- a substantial number of reported STJ sourcegen bugs relate to emergent composition patterns resulting in accidental variable capture or violation of generic parameter constraints. Apropos, some of the overheads of metaprogramming using ASTs can be eliminated by introducing code quotations at the language level, in the style of Lisp, Scala and F# (and C# to an extent, for the case of IQueryable expressions). Metaprograms using code quotations will always produce well-typed programs if the metaprograms themselves are well-typed. |
It's certainly possible it would help in the json generator. It's also possible it would hurt (likely, in my experience). I'm simply highlighting that the assertion that it makes code more maintainable is far from a universal truth. Thanks. |
It's also possibly to just hybrid this. Easily build content with strings, then trivially format the result and pass teh formatted result to roslyn. |
There was a related discussion in #83614 (comment). TLDR is that the configuration binder introduces a source writer that can handle multi-line literals in addition to single lines. It can be moved to a shared location if other generators want an API to write multi-line code blocks. AFAIK the other generators use writers that can emit code only one line at a time. |
Addressed by #86526 |
The generated output isn't perfectly formatted .
The upcoming
ConfigurationBinder
source generator (#44493) will introduce a new source-code writer utility API with correct formatting. A follow-up PR should update the System.Text.Json source generator implementation to use it as well.The text was updated successfully, but these errors were encountered: