-
Notifications
You must be signed in to change notification settings - Fork 189
Implement round-trippable, editable TOML parsing #64
Conversation
Sorry but this isn't really quite in a state where I can review it (odd formatting, tests failing, lack of comments, etc). Could you elaborate a little more on what's going on here? Is the goal to preserve comments and whitespace that are otherwise found in a configuration file? |
Yes, have a parser that preserves comments and whitespace and building API on top of this that allows changes to the parsed document without losing comments and whitespace. |
I think in general that it may be about time that toml-rs had a rewrite of the parser and internals, I've long worried about the number of allocations going on and I suspect it may only get worse over time. I think that retrofitting support (with a parallel AST) for managing comments and such may end up causing too much pain and it'd be better to rewrite the parser from the ground up to handle these cases. |
I've cleaned up code, documented internal layout and fixed the failing test. It should be reviewable now. |
Yeah this is looking like a whole new library altogether almost, so perhaps it's best to be a standalone dependency? I'd love to clean up the parser here so both projects could share the implementation, however! |
bab05fc
to
bb12015
Compare
disclaimer: i didn’t try to read and understand this, so i’m generically telling you my thoughts about the API: i think the parse result should simply be a (possibly indexed?) AST containing whitespace and comment nodes. the API should allow
|
I'm closing this PR because I split it all to a separate library here. I didn't write the docs yet and edit operations are missing (insert/delete/set_value_at), but the read side is mostly done. |
ah nice! so to add a string value with the same quoting style, we have to recursively traverse the document and count strings with one or the other, then determine the predominate one, and then find the position to insert to and call |
Awesome, thanks @vosen! |
@flying-sheep Yes, the plan is for the two final steps is to look like this:
|
is that duplication with the “value”? |
No, notice the |
you store the raw syntax, and not the actual [u8] content and sth. like |
Yes, althought I was tempted to store syntax in a more granular fashion. Similarly to your example: splitting string value into |
this makes no sense. either you have only single and double (referring to the number of strokes is the quote character), or single and triple (referring to the number of quote characters). if anything, we should use the meaning. enum QuoteStyle {
Raw(String), RawMultiline(String),
Normal(String), NormalMultiline(String),
}
well, “raw” means that it contains all escapes and so on right? so replacing “raw” means that you have to sanitize the string contents to be a valid literal with the same meaning. but i see why you designed it that way: there’s many many stylistic parts to representing things:
still, i’d wish for a way to choose some of those when first inserting a value instead of basically inserting it twice. maybe:
trying to serialize a string containing |
This is a very rough implementation of round-tripping TOML parser. It correctly round trips all the TOML test suite (except inline arrays).
It's very unsuitable for merging at the moment (broken formatting, random layout of modules, unimplemented inline arrays, no public API etc.). Since it's a very invasive change, that make internals more complicated (for no real gain if you just want to simply serialize and deserialize), my question is, should I work on getting it into
toml-rs
or move this work to a fork (with just API for load-edit-save of TOML documents, leaving attribute-based serializing/deserializing here)?