Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Use grammar for VLQ and Mappings and decode via SDO #180

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

szuend
Copy link
Collaborator

@szuend szuend commented Mar 10, 2025

Draft that changes VLQ and mappings to use grammar plus "syntax-directed operations" for decoding rather than a pure algorithmic approach.

Only uploaded so we have a preview for discussion.

Preview: https://szuend.github.io/source-map/branch/mappings-grammar/#sec-mappings

@nicolo-ribaudo nicolo-ribaudo self-requested a review March 10, 2025 12:40
@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Mar 10, 2025

I attempted to re-express the DecodeBase64VLQ operation without using an accumulator, since the way fields are mutated in the accumulator is not super easy to keep track of (recursion + mutable state). Do you think this version would work, or am I missing something about shifting the values?

And then you use it as "let value be the VLQSignedValue of Vlq".

VLQSignedValue ( )

Vlq :: VlqDigitList

  1. Let unsigned be the VLQUnsignedValue of VlqDigitList.
  2. If unsigned modulo 2 is 1, let sign be -1.
  3. Else, let sign be 1.
  4. Return sign × floor(unsigned / 2).

VLQUnsignedValue ( )

VlqDigitList :: DigitWithoutContinuationBit

  1. Return DecodeBase64Digit(DigitWithoutContinuationBit).

VlqDigitList :: DigitWithContinuationBit VlqDigitList

  1. Let left be DecodeBase64Digit(DigitWithContinuationBit).
  2. Let right be the VLQUnsignedValue of VlqDigitList.
  3. Return (right - 32) × 25 + left.

@szuend
Copy link
Collaborator Author

szuend commented Mar 10, 2025

Very nice! That looks much simpler. Although I think there is a bug: We need to slice the continuation bit off left and not off _right_:

3. Return _right_ × 2 ** 5 + (_left_ - 32).

Alternatively we could define DecodeBase64Digit as an SDO that handles the continuation bit.

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Mar 10, 2025

You are right, the - 32 is in the wrong place 👍

I think I have a slight preference for the SDO approach, but only with a name that implies "this isn't actually the base64-decoded value, but the base64-decoded value after trimming the VLQ continuation bit" :)

Or, we remove DecodeBase64Digit, and we just add cases for the two digit types directly in the VLQUnsignedValue SDO.

@szuend
Copy link
Collaborator Author

szuend commented Mar 12, 2025

With tc39/ecmarkup#637 still in-flight, I applied Nicolos idea but via an additional nonterminal indirection. Should be good enough to discuss this in tonights' meeting.

Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second review pass. The VLQs look good now, and I finally reviewed the mappings definitions.

I would prefer this slightly different approach, based on the "informal" understanding that mappings contains one or more segments, one per line, separated by semicolons. A segment contains zero or more mappings, separated by commas. This avoids having to think about the "what if there are consecutive semicolons?" case, since they are just consecutive segments.

Mappings :
  Segment
  Segment `;` Mappings

Segment :
  MappingList?

MappingList :
  Mapping
  Mapping `,` MappingList

Note that the above definitions makes "mappings": "" valid, which matches the current spec.

DecodeMappingsSdo would then become
DecodeMappingsSdo ( ... )

Mappings : Segment ; Mappings
MappingList : Mapping , MappingList

  1. For each child node child of this Parse Node, do ... (same as now)

Segment : MappingList?

  1. Set state.[[GeneratedLine]] to state.[[GeneratedLine]] + 1.
  2. Set state.[[GeneratedColumn]] to 0.
  3. If MappingList is present, perform DecodeMappingsSdo of MappingList with arguments ...

Mapping : GeneratedColumn (same as now)
Mapping : GeneratedColumn OriginalSource OriginalLine OriginalColumn Name? (same as now)

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Mar 12, 2025

Maybe can we also rename Mappings to SegmentList, or MappingsSegmentList? It's a bit weird that "mappings" is a list of segments, and "segment" is a list of mappings, even though that's how we always referred to it 😅

@szuend
Copy link
Collaborator Author

szuend commented Mar 12, 2025

Thanks for the new grammar, I love it!

Maybe can we also rename Mappings to SegmentList, or MappingsSegmentList? It's a bit weird that "mappings" is a list of segments, and "segment" is a list of mappings, even though that's how we always referred to it 😅

IMO it would be nice to have consistent naming of the goal symbols w.r.t. to how the field is named in the source map JSON. What about renaming Mappings to SegmentList but then add a goal symbol MappingsField : SegmentList?

@nicolo-ribaudo
Copy link
Member

What about renaming Mappings to SegmentList but then add a goal symbol MappingsField : SegmentList?

Sounds good 👍

@szuend
Copy link
Collaborator Author

szuend commented Mar 12, 2025

Changed the grammar as per your suggestion. Also renamed the SDO to DecodeMappingsField, which makes more sense. The "Sdo" suffix was only there because I couldn't think of anything better at the time.

spec.emu Outdated
1. Perform DecodeMappingsField of |OriginalSource| with arguments _state_, _mappings_, _names_ and _sources_.
1. Perform DecodeMappingsField of |OriginalLine| with arguments _state_, _mappings_, _names_ and _sources_.
1. Perform DecodeMappingsField of |OriginalColumn| with arguments _state_, _mappings_, _names_ and _sources_.
1. Let _source_ be _sources_[_state_.[[SourceIndex]]].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only in an error case, but there's a small semantic difference between the old version and this I think. If the source index is invalid, the old algorithm kept the source as null while this returns undefined in JS or out-of-bounds lookup in general. (original line/column are also kept null when they are negative)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there is some bound checks missing. Same for names.

I'm not sure about undefined though. We are using the List specification type here which does not actually spell out what is returned for out-of-bounds accesses, so it's even worse since its undefined behavior.

I'll add some bound checks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the algorithm here to do the actual validation and set the fields explicitly to null. PTAL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also filed #184. It seems some implementations (e.g. Chrome) behave as I wrote the spec initially but others (e.g. Firefox) behave as the existing spec says.

Comment on lines +865 to +866
1. For each child node _child_ of this Parse Node, do
1. If _child_ is an instance of a nonterminal, then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what this means.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted this from the ECMAScript spec that has a couple instances of this (e.g. Contains).

Since we only have two productions, we could spell them also out explicitly:

<emu-grammar>
  SegmentList :
    Segment `;` SegmentList
</emu-grammar>
<emu-alg>
  1. Perform DecodeMappingsField of |Segment| ...
  1. Perform DecodeMappingsFIeld of |SegmentList| ...
</eum-alg>

And same for the MappingList production.

MappingList?
</emu-grammar>
<emu-alg>
1. Set _state_.[[GeneratedLine]] to _state_.[[GeneratedLine]] + 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the GeneratedLine starts at 1 instead of 0? But originalLine starts at 0. And [[GeneratedLine]] is "a non-negative integer", implying 0 is valid and 1 is actually the second line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

I think it might be sufficient to switch the statements around. Perform DeocdeMappingsField first, then increment the line.

Alternatively we could also move the increment to the SegemtnList production. I'd find that even clearer:

<emu-grammar>
  SegmentList :
    Segment `;` SegmentList
</emu-grammar>
<emu-alg>
  1. Perform DecodeMappingsField on |Segment| ...
  1. Set _state_.[[GeneratedLine]] to _state_.[[GeneratedLine]] + 1.
  1. Set _state_.[[GeneratedColumn]] to 0.
  1. Perform DecodeMappingsField on |SegmentList| ...
</emu-alg>

@nicolo-ribaudo any preference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants