-
Notifications
You must be signed in to change notification settings - Fork 13
Proposed changes to NestedText that are not backward compatible #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for raising this issue for discussion - I'd be interested to hear other people's thoughts on this. Summarising my position that I think I've already made clear in comments of #21 and #22:
|
Questions:
|
Josh, a single line multi-line key is fine, so your example is perfectly valid. In fact, a multi-line key can be empty (this is the only way to get an empty key):
which becomes |
To share my 2 cents: I think I have a similar point of view to @LewisGaul. Allowing a wider range of keys to be used sounds attractive, but I'm not sure the extra "oddness" of multiline keys is worth it. They're easily machine parseable, but I think potentially confusing to the human reader. Using a different prefix to |
I'm late to this thread, but I thought it would be helpful to list some pros and cons (as I see them) of each proposed syntax, including one that @KenKundert and I have discussed but not mentioned yet:
I personally like the idea of adding a multiline key syntax, although I definitely appreciate the arguments against doing so. The |
Thanks for the summary, I'd agree with the pros/cons. I'm still -1 for multiline keys in general, looking at the options above. I struggle to visually parse any of these as mappings, since I think YAML/JSON always require a colon separating the key and the value, which is missing from all of the above. I think perhaps I'd be slightly less against it if the separating colon was added, e.g. (picking my preferred syntax from the options above):
However, this doesn't play nice with values that aren't a simple single-line string:
I think maybe I still prefer this to a lack of requiring the colon though. In fact, if the colon is still required, I guess in theory the keys could just be represented as regular multi-line strings (with |
Okay, things seem like they have settled down. I have implemented some of the suggestions and checked them in. More work to be done, but just on completing these changes. No additional features are being considered. The new version:
Details are in the documentation. Example:
becomes
My expectation is that these new features will not be heavily used, but would be very helpful on occasion and helps to complete the language. With these changes, NestedText becomes capable of handling any hierarchical combination of lists, dictionaries and strings. Please try it out and give me your impressions. -Ken |
Thanks for this work! Although it does add some complexity to the language (albeit removing the complexity of quoted keys) I think this is a positive set of changes, as you say giving more completeness to the language. I'll have a go at implementing these changes in zig-nestedtext at some point (maybe next weekend) and provide any feedback I might have. One minor point - shouldn't the new version be 1.4.0 rather than 1.3.2, given the backwards incompatibility and size of the changes? |
My version numbers are interpreted as follows: |
I think the inline lists and dictionaries features add adding huge complexity to an already near-perfect format. I'm personally strongly against those features. It breaks expectation, which simply kills the format for me. If I wanted those feautres, I'd use yaml. No-quoted-keys are a good idea. They vastly reduce the complexity. As for the multiline keys, I don't find them a bad idea. However, they solve what I consider are non-problems. Currently, unquoted keys cannot contain colons, so multilined-keys are invented to allow colons and much more. But we already constrain keys with whitespace-trimming rules. I think it would be much more sensible to then also disallow ': ' and '\n' character sequences. With NestedText, their is already a precedent that validation and schemas are left to the developer. I think setting the expectation for developers to remove ': ' and '\n' character sequences from their keys is better than allowing them. It's probably not good to encourage those characters sequences in the first place. It could be considered bad practice. If they are necessary, there is the meta-solution of replacing them with literals like '\n\r\t\x0a\u0123'. Notes that ':' would still be allowed afaik, meaning you could use URLs as keys, like in json-ld. |
Would you mind elaborating on the complexity that inline lists and dicts would add? From my perspective, they add a moderate amount of complexity to the parser, but very little complexity for the end user. The syntax is intuitive (i.e. common to many programming languages) and maintains the property that the type of each line can be identified from its leading characters. Regarding YAML, I think its problem is not that it supports inline lists/dicts, but that its rules for quoting and type-casting are complicated and frequently break expectations (to use your phrase). But I don't see how the proposed changes to NT would break expectations in a comparable way. I do think that there are advantages and disadvantages to the inline syntax, I just think the former outweigh the latter (see #24 for more discussion): Advantages:
Disadvantages:
Ambiguous:
|
I've tried out the new multiline object key syntax, and have come up with some edge cases that make me question the re-use of As a user, what would you read the following as?
I think this could be interpreted in quite a range of ways, but as I understand it this is actually I'd still be in favour of using
Also, there seems to be a bug(?) in the Python implementation that disallows starting the file with a multiline key, giving a "content must start with key" error. Slight nitpick: the latest language reference has a section headed 'Inline Objects' that's talking about both inline objects and lists. I understand objects are being referred to as 'dictionaries' but this is quite a pythonism. E.g. my understanding is that JSON (JS Object Notation) refers to key-value pairs as 'objects'. |
I was going to respond to this on Github, but I cannot find it. Did you delete
your post?
There was not a lot of thought that went into picking '>' as the string tag. We
considered '|' briefly and decided to go with '>'. There was no a good reason
for choosing one versus the other. It seemed like '>' had more use as a quoted
string character. As you say, it was heavily used as such in email messages.
That led to syntax highlighting support in editors. So it seemed like the
better choice. But fundamentally the choice was largely arbitrary. I think
revisiting that choice would not be a good idea at this point.
…-Ken
On Sat, May 01, 2021 at 02:04:55AM -0700, tototest99 wrote:
Hello and thank you very much for your work on NestedText as a human friendlier format.
I’m very sorry to discover this project this late and to ask the kind of question I’m about to, on something which is probably settled since a long time, but is there an archive of thoughs process on the choice of `>` as multiline string marker? Especially in comparison to `|`.
My reasoning was that `>` is very much used in emails or forums where it marks a _response_, with the symbol being oriented laterally, while something like `|` especially when chained:
```
address:
| 2586 Marigold Lane
| Topeka, Kansas 20682
```
seems IMHO a more natural barrier forming symbol to mark a _simple block_, from a human cognition / reading point of view (and especially in a text editor with less vertical spacing between the `|`s.
This is a very niptick point and I will probably adopt NestedText for a project or two, but you know, just in case things where not engraved in stone, I sputtered the idea.
Thanks again for this project.
PS: not needing to quote keys is a good idea.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#23 (comment)
|
Yes, as it was too unimportant. nevertheless, thank you for your response! |
Lewis, I think your point about using colon to identify two distinct situations being confusing in some cases is a good one that we did not consider. It is worth considering using a completely different character for multiline keys. |
Lewis,
is not allowed because all keys must be paired with values. In fact, this is the suggestion you made. My implementation was accepting multiline keys without value, which probably caused the confusion, but it is now fixed. |
Thanks for clarifying that, I was indeed misled by the Python implementation - I've fixed my Zig implementation now (thanks for the tests update). I've given some thoughts to a few alternative syntax options and included some notes below. I think requiring the value after a multiline key makes it a lot clearer though, so don't have particularly strong feelings on the options below - just thought it might be useful listing out some possible options :) On the question of the character to indicate key lines, maybe a fairer variant of the above to consider would be as follows (where all colons are syntax).
JSON: With question marks for object keys, for comparison:
Also comparing an alternative I suggested briefly a while back, reusing regular multiline strings (where there is always exactly one colon separating the key and the value):
I just tried converting the above example to yaml (see below), and it does involve a question mark, but looks pretty odd, so either way I think NestedText will be doing better here! A:
B: ''
C: ''
D: ''
? 'E
F'
: '' Finally, comparing the three alternatives above with a more normal example (taken from the Status quo:
JSON: Question marks for object keys:
Reusing multiline string syntax (showing the downside of requiring an extra level of indentation with just a colon sitting alone at an indent level!):
|
Thanks for illustrating the alternatives. Currently we are expecting to choose between the leading colon or question mark to introduce key items. We are not considering your third alternative of reusing mutltiline strings for keys. I think it is likely that we will stay with the leading colon for multiline keys. |
Quick question on inline lists/objects: is
but I would have read this as two items (separated by the comma). Allowing a trailing comma doesn't seem like a good fit when the inline structures must be on a single line. Is there really a need to allow specifying empty strings in inline structures given this confusion? |
I've now fully implemented inline lists and objects in zig-nestedtext, as well as multiline object keys (using a leading colon), which you can try out by downloading the The one deviation from the spec I currently have is that I disallow empty keys/values in inline objects/lists, and I'd like to see some discussion on this. My reasoning for this:
|
As you recognize, this is what allows us to distinguish between empty lists and those with a single empty string. Supporting empty lists and empty dictionaries is considered desirable because it provides a completeness to the language. With this it is now possible to represent any hierarchical combination of lists, dictionaries, and strings. Completeness increases ease of use because it eliminates exceptions that must be handled by user code that calls the dump functions. It is unusual to use terminal commas on single line lists, but they are only required to identify empty terminal values. Other than that, while they may be unusual, they have become common on multiline lists and there is really no reason to outlaw them. In my view, this behavior is a net positive. |
Completely agree.
This is where I disagree, from the perspective of ease of reading and understanding NestedText. A trailing comma seems completely unintuitive here to me, making it look like there's an empty string at the end after the comma (which there isn't). You say they've become common on multiline lists - I would agree, but strictly for multiline, not for single line (and in fact trailing commas are entirely disallowed in JSON). Some examples: In a way it's not as bad for objects since the colon is required as well as the comma, although it's then unclear whether a trailing comma should be needed if the last item contains an empty value... That I'm really not seeing the argument for allowing this though, seeing as it wouldn't be restricting the language at all to disallow. Inline object/list syntax is already one of the most complicated bits of the NestedText syntax, and allowing empty values just adds to potential mental overhead in trying to read NT files. I also just noticed that it doesn't seem to be possible to use inline syntax at the root level in your Python implementation, is that intentional? |
Once you allow terminal commas on multiline lists, it would be inconsistent to not allow them on single line lists. For example, Python allows lists, tuples, argument lists, etc to have terminal commas in both cases. Specifically, [1,2,3,] is a valid alternative to [1,2,3]. I don't think terminal commas impose a significant mental load on the user. One always needs to become familiar with the ways that a language works. This detail is one that most users will never encounter, and when they do, it is not hard to grasp; nor it is outside the norm. The only thing about NestedText that is different from other languages that allow terminal commas is that values can be completely empty. So it is a little different from Python, but not in a way that is artificial or confusing. I was able to explain it in one sentence in the documentation, and you easily recognized the problem it was designed to address. When I said that it is increasingly common for languages to support terminal commas, I was extrapolating from the fact that Python has supported them for a very long time and the fact that a lot of people complain that JSON does not. So I assumed that they were common in other languages, but I don't actually know that to be true. Actually, Python has a very similar situation and rule for tuples (tuples are immutable lists). An empty tuple is represented with The Python implementation allows inline lists and dictionaries at the top level. Be sure to specify |
Sure, but NestedText doesn't allow multiline lists.
Thinking about it, I'm against empty values being allowed in the inline list/object syntax independent of this discussion about trailing commas (although if you remove empty values and you remove any need for allowing trailing commas). As you point out, there may be precedent for trailing commas (albeit as an extension of mutliline structures), but I'm not aware of precedent for empty values in this kind of syntax - even YAML disallows this! Things like
It's not hard to explain, it just feels like a gotcha that needs explaining, rather than being obvious at first sight. The last thing I have to say about this is that it would be much easier to later add support for empty values if a need arises than to deprecate support for empty values if it turns out to be a disliked feature. I would very much like to hear other people's input on this, and have slight concern about the possible addition of unnecessary complexity/ambiguity when reading this data format. |
Its been almost three weeks since the last comment, so I think its is time to make the call. I have been living with the proposed changes as implemented in the current version on github and am comfortable with that version. Compared to version 1.3 it:
Unless I hear any final comments, I will be re-releasing the current version as the next stable release, version 2.0, in the next few days. |
Version 2.0 has been released. |
We are considering deprecating quoted keys. This will be a change that is not backward compatible. You can see the discussion that triggered this decision here. To summarize, the feeling is that:
Eliminating quoted keys further limits the strings that can be used as keys. We are considering adding multi-line keys to replace quoted keys. It is felt that multi-line keys are more in keeping with the style of NestedText than quoted keys were, and they allow NestedText to accept any string as a key.
Multi-line keys are patterned after multi-line strings, except the string tag
>␣
is replaced by the dict tag:␣
and a trailing indented value is required. For example:This would be interpreted as:
Multi-line keys are not expected to be commonly used, but they are being considered because the fit naturally in the language and they make NestedText completely general, meaning that with multi-line keys NestedText can handle any combination of lists, dictionaries, and strings, where the leaf values are all strings. We could not say that previously.
Comments?
The text was updated successfully, but these errors were encountered: