number suffix type annotations #513

zkat · 2025-03-30T23:00:42Z

Fixes: #510

zkat · 2025-03-30T23:30:12Z

@tabatkins honestly the most heartbreaking thing about all this to me is that I can't have 123u32 and such :( Gotta do 123#u32

draft-marchan-kdl2.md

zkat · 2025-04-01T20:14:53Z

@bgotink does this align with your understanding/experience/what you did in your lib?

Fixes: #510

zkat · 2025-04-02T21:23:32Z

alright, tests added. This is ready for final review.

clarfonthey · 2025-04-03T04:59:34Z

draft-marchan-kdl2.md

+annotation as a "suffix", instead of prepending it between `(` and `)`. This
+makes it possible to, for example, write `10px`, `10.5%`, `512GiB`, etc., which
+are equivalent to `(px)10`, `(%)5`, and `(GiB)512`, respectively.
+


Just for readability purposes, I think it's worth mentioning the # escape hatch here with a reference to the "Explicit Suffix Type Annotation" section, at least when it comes to types like u32. For example, maybe:

To remove ambiguity, some suffixes must be prefixed with #: for example, 10.0u8 is invalid, but 10.0#u8 is. The full list of rules for invalid suffixes is clarified in the "Explicit Suffix Type Annotation" section.

I dropped a suggestion for this.

Ah, GitHub collapsed it as "outdated" because they're bad

tabatkins

r+ after review of suggested changes

draft-marchan-kdl2.md

tabatkins · 2025-04-03T13:49:25Z

draft-marchan-kdl2.md

+annotation as a "suffix", instead of prepending it between `(` and `)`. This
+makes it possible to, for example, write `10px`, `10.5%`, `512GiB`, etc., which
+are equivalent to `(px)10`, `(%)5`, and `(GiB)512`, respectively.
+


I dropped a suggestion for this.

bgotink

This matches what I implemented apart from one test that appears to be wrong and one mistake on my end where the parser skips certain validations on # suffixes which makes 123#123 equivalent to ("123")123 which is wrong.

bgotink · 2025-04-03T16:06:13Z

tests/test_cases/input/suffix_type_bare_underscore_fail.kdl

integer can end on an underscore so this is actually valid and equivalent to (abc)123_

correct. this is why we can't start a bare suffix with _, becuase the syntax would parse differently than intended.

bgotink · 2025-04-05T19:55:55Z

The following tests that previously failed now run successfully:

test name	test	equivalent document
`bare_ident_numeric_fail.kdl`	`node 0n`	`node (n)0`
`bare_ident_numeric_sign_fail.kdl`	`node +0n`	`node (n)+0`
`illegal_char_in_binary_fail.kdl`	`node 0bx01`	`node (bx01)0`
`multiple_x_in_hex_fail.kdl`	`node 0xx10`	`node (xx10)0`
`no_digits_in_hex_fail.kdl`	`node 0x`	`node (x)0`

zkat · 2025-04-07T17:19:47Z

Uggghhhhh. That makes sense. Looks like we’re gonna need to be more specific what order things run in. I have an idea for the grammar.

Co-authored-by: Tab Atkins Jr. <[email protected]>

zkat · 2025-04-17T18:56:27Z

@bgotink

The following tests that previously failed now run successfully:

Looking at this...

These should be dropped because they're ok now:

test name test equivalent document
bare_ident_numeric_fail.kdl node 0n node (n)0
bare_ident_numeric_sign_fail.kdl node +0n node (n)+0

These tests should stay and continue to fail. Once you've hit a 0b/0o/0x number prefix, you SHOULD only be able to parse their related number formats:

test name test equivalent document
illegal_char_in_binary_fail.kdl node 0bx01 node (bx01)0
multiple_x_in_hex_fail.kdl node 0xx10 node (xx10)0
no_digits_in_hex_fail.kdl node 0x node (x)0

zkat · 2025-04-17T20:36:01Z

uggghhhh I understand why those last 3 tests pass now. I want to do something about it, though. I don't like that. I wonder if there's a good way around it.

zkat · 2025-04-17T21:07:19Z

@bgotink @tabatkins I've... kind of changed the rules a bit. They're simpler now. And most importantly: we have 0u64 available now!

I checked #510 and I didn't see any discussion about us tackling these rules, because we were focusing so much on what the rules for the suffix should be that we didn't take a step back and think of these numbers as a whole, and how the number syntax in general could be addressed.

Please lmk what you think. I think with these changes, we'll get the results I expected in #513 (comment)

zkat · 2025-04-18T05:16:50Z

Some key changes:

I reorganized things a bit in general.
The grammar itself now blocks simultaneous suffix and prefix annotations
The underscore _ initial character restriction was removed: a bare suffix CAN'T start with one because the integer will slurp it up first
The complex rules meant to disambiguate from non-decimals were removed. The fact that we only allow bare prefixes on decimals is sufficient to protect us here imo. I think these complex rules formed because things were moving fast and we didn't take a step back and really rethink the implications of only doing decimals. This has been clarified in the spec. With the new rules, there is no dangerous ambiguity, just potential syntax errors on certain zero values, which is notably NOT an unexpected parsing success, which I think is the dangerous bit.
The rules around exponential-likes have been changed a bit to guard against small typos if you miss the digit part (so 1e+ is illegal)

zkat · 2025-04-18T06:58:55Z

I'm also wondering: could we just drop the exponent restriction (but keep the e+ protection, in case someone fails to write that digit and ends up with an unfortunate parse)

mwh · 2025-04-19T00:12:00Z

I don't think the specification prose as currently written does require or even allow a trailing _ to be consumed by the number ("digits ... may be separated by _"), but the authoritative grammar does include arbitrary underscores in place of digits anywhere except the beginning.

I think either the description of the grammatical language needs tightening, or the grammar may now allow some undesirable constructions with underscores and suffixes. Specifically, I'm not sure whether * is expected to commit a parser to whatever it finds the first time, or whether it can backtrack to allow later productions to match. Here, it would be backtracking to an underscore after the input didn't match when the underscore was consumed as part of a number.

Consider 12_3,x. The overall match would fail when (digit | '_')* from the integer production consumes _3; does it then backtrack and try consuming less of the input? If it stops short after "12" and leaves "_3,x", the whole input matches suffixed-decimal successfully with significand accepting "12" and bare-type-suffix accepting "_3,x" — so the result would be equivalent to (_3,x)12, though I think it should be an error. 12_3.4 and 12_3.4.5 differ at the same point, or consider 1_234,567. All of these can be produced from the grammar, at least.

I'm not sure whether I am misreading the description of the grammar language. * consumes "as many instances as possible without failing the match" — is that failing the match of the whole input, or does it just mean as many instances as are present at this point in the input and failure isn't really part of it? The comparison to standard regex semantics and the existence of cut points makes me think it can shorten if needed. If it does commit early to the longest sequence it finds, this issue doesn't come up. Otherwise, for integer to definitively slurp the underscore up and make this an error, I think there would need to be a cut point suffixed-decimal := significand ¶ (bare-type-suffix | (exponent? explicit-type-suffix)).

If there is an issue, either banning _ as the initial character of a suffix again or solidifying the grammatical handling of integers could address it. I am not personally a fan of baking parsing and backtracking rules into a grammar and would probably just block _, but there are reasons to have that kind of grammar too, particularly to constrain where an error is detected.

Something shaped like 1_234,567 seems like the primary case where this is realistic and actually matters, or someone trying to write a list 1_234, 5_678, .... I do think the grammar should rule these out, but if I were implementing a parser directly off this grammar right now, I would end up with this backtracking and unknowingly accepting these cases because nothing commits the parse to the path that produces an error. If I built the parser with a lexer in the front, I think the lexer would probably consume the whole number as expected and then I wouldn't encounter the issue. Clearly one of those is wrong, but it's not good for compatibility if both seem reasonable to make.

If nothing else, test cases including both underscores within the number and invalid suffixes will be good to ensure that the incorrect readings are detected.

I also wonder in a similar way about 0xaz. The prose does rule this out. This time it's the grammar semantics of - in significand-initial I'm not certain about: "any digit except something that matches the literal '0x'" seems like it'd be the same as just digit alone, because "0x" is not a digit and is not matched by digit. The intention here is clear, just the formalism may not match up.

zkat force-pushed the zkat/suffixes branch from 58f3d2a to c02936c Compare March 30, 2025 23:01

zkat mentioned this pull request Mar 30, 2025

Idea: number suffixes as annotations #510

Open

tabatkins reviewed Mar 31, 2025

View reviewed changes

draft-marchan-kdl2.md Outdated Show resolved Hide resolved

number suffix type annotations

07788d1

Fixes: #510

zkat force-pushed the zkat/suffixes branch from c96bc4e to 07788d1 Compare April 2, 2025 21:22

zkat requested a review from tabatkins April 3, 2025 03:47

clarfonthey reviewed Apr 3, 2025

View reviewed changes

tabatkins requested changes Apr 3, 2025

View reviewed changes

bgotink suggested changes Apr 3, 2025

View reviewed changes

zkat and others added 5 commits April 17, 2025 11:51

Update draft-marchan-kdl2.md

3b94363

Co-authored-by: Tab Atkins Jr. <[email protected]>

Update draft-marchan-kdl2.md

a4b1053

Co-authored-by: Tab Atkins Jr. <[email protected]>

Update draft-marchan-kdl2.md

2f21bd3

Co-authored-by: Tab Atkins Jr. <[email protected]>

Update draft-marchan-kdl2.md

c4613f6

Co-authored-by: Tab Atkins Jr. <[email protected]>

Update draft-marchan-kdl2.md

662917c

Co-authored-by: Tab Atkins Jr. <[email protected]>

new approach to these

1362646

more cleanup and more tests

d5d4f46

zkat force-pushed the zkat/suffixes branch from b5e590c to d5d4f46 Compare April 18, 2025 04:07

zkat force-pushed the zkat/suffixes branch from c6133ef to fa14d74 Compare April 18, 2025 06:06

more refinement

06a6423

zkat force-pushed the zkat/suffixes branch from fa14d74 to 06a6423 Compare April 18, 2025 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number suffix type annotations #513

number suffix type annotations #513

zkat commented Mar 30, 2025

zkat commented Mar 30, 2025

zkat commented Apr 1, 2025

zkat commented Apr 2, 2025

clarfonthey Apr 3, 2025

tabatkins Apr 3, 2025

clarfonthey Apr 4, 2025

tabatkins left a comment

tabatkins Apr 3, 2025

bgotink left a comment

bgotink Apr 3, 2025

tabatkins Apr 3, 2025

bgotink commented Apr 5, 2025

zkat commented Apr 7, 2025

zkat commented Apr 17, 2025

zkat commented Apr 17, 2025

zkat commented Apr 17, 2025

zkat commented Apr 18, 2025

zkat commented Apr 18, 2025

mwh commented Apr 19, 2025

number suffix type annotations #513

Are you sure you want to change the base?

number suffix type annotations #513

Conversation

zkat commented Mar 30, 2025

zkat commented Mar 30, 2025

zkat commented Apr 1, 2025

zkat commented Apr 2, 2025

clarfonthey Apr 3, 2025

Choose a reason for hiding this comment

tabatkins Apr 3, 2025

Choose a reason for hiding this comment

clarfonthey Apr 4, 2025

Choose a reason for hiding this comment

tabatkins left a comment

Choose a reason for hiding this comment

tabatkins Apr 3, 2025

Choose a reason for hiding this comment

bgotink left a comment

Choose a reason for hiding this comment

bgotink Apr 3, 2025

Choose a reason for hiding this comment

tabatkins Apr 3, 2025

Choose a reason for hiding this comment

bgotink commented Apr 5, 2025

zkat commented Apr 7, 2025

zkat commented Apr 17, 2025

zkat commented Apr 17, 2025

zkat commented Apr 17, 2025

zkat commented Apr 18, 2025

zkat commented Apr 18, 2025

mwh commented Apr 19, 2025