Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic parselets #10

Closed
phorward opened this issue Nov 30, 2021 · 3 comments
Closed

Generic parselets #10

phorward opened this issue Nov 30, 2021 · 3 comments
Assignees
Labels
feature New feature or request syntax

Comments

@phorward
Copy link
Member

phorward commented Nov 30, 2021

Parselets could be defined as generics when constant consumables are marked for being replaceable at a parselet's usage, which duplicates the parselet for the specific consumable.

Draft

The idea for generic parselets is to allow for replaceable constants defined with a special <...>-notation.
The constants are held and replaced during compile-time, and turned into specific VM instructions on demand of their usage.

Definition

Generic : @<P, Q: _, r: 0> { ... }'

Generic is a generic parselet with the replaceable constants P, Q and r, where P and Q are consumables, and Q has _-whitespace pre-defined, r has constant 0 pre-defined.

Usage

These are example calls.

Generic<Integer>
Generic<Integer, Q: ' '>
Generic<Q: '...', P: Integer, r: 10>

Builtin generics

Some generics should be available as built-ins, and could replace the current implementation eg of

Until : @<P, Escape: Void>

The Until-parselet could be a generic built-in parselet to parse data until a specific token or parselet occurs.

  • '"' Until<'"', Escape: '\\'> '"' parse strings like "Hello World" or with escape sequences like "Hello\nWorld"
  • Implement a String parselet as shortcut for above, like String: @<Start, End: Void, Escape: Void>
  • Until<Not<Char<A-Za-z_>]>> parse anything consisting not of Char<A-Za-z_>
  • Until<EOF> read all until EOF
  • Line built-in #38 refers to a Line parselet for matching input lines

Repeat : @<P> min=1, max=0

This is a simple programmatic sequential repetition. For several reasons, repetitions can also be expressed on a specialized token-level or by the grammar itself using left- and right-recursive structures, resulting in left- or right-leaning parse trees.

Used by optional, positive and kleene modifiers.

Replacement for current repeat-construct.

Not : @<P>

This parser runs its sub-parser and returns its negated result, so that an accept becomes
rejected and vice-versa.

Replacement for current not-construct.

Peek : @<P>

This parselet runs P and returns its result, but resets the reading-context afterwards. It can be used to look ahead parsing constructs, but leaving the rest of the parser back to its original position, to decide.

Due to Tokays memorizing features, the parsing will only be done once, and is remembered.

Replacement for current peek-construct.

Expect : @<P> msg=void

This constructs expects P to be accepted. On failure, an error message is raised as Reject::Error.

Replacement for current expect-construct.

List : @<P, Separator: ',', empty: true>

Parse a separated list.

List : @<P, Separator: ',', empty: true> {
    Self Separator P
    if empty Self Separator   # allows for "a,b,c,"
    P
}

Keyword : @<P>

Parses a keyword, which is a consumable not followed by any alphabetic letter.

Definition

Keyword : @<P> {
    P Peek<Not<Alphabetic>>
}

Example

Keyword<'if'>  # matches "if" in "if a...", "if(x)", "if." but not in "ifx"

User-defined generics

Example use-case is this grammar draft for Tokay's own REPL itself. It implements a generic Set parselet which can be used by different switches, allowing for feature enabling-/disabling like #debug, #debug on or #debug off.

# Grammar for REPL commands

_ : @{
    ' '
    '\t'
    '\n'
}

Switch : @<Ident> emit=void {
    if !emit {
        emit = str(Ident)
    }

    Ident _ 'on'            ast(emit, true)
    Ident _ 'off'           ast(emit, false)
    Ident                   ast(emit, true)
}


'#' {
    Switch<'debug'>
    Switch<'verbose'>
    Switch<'compiler-debug'>
    'run' _ Name            ast("run")
}
@nivpgir
Copy link

nivpgir commented Mar 10, 2022

Sounds like a cool feature, but what benefits does this have over passing parselets to a function?

I mean specifically for tokays use case, which I thought of more as a scripting language, and thus having things work dynamically makes more sense to me.

@phorward
Copy link
Member Author

Sounds like a cool feature, but what benefits does this have over passing parselets to a function?

Tokay programs are executed in two stages:

  1. First a constant value replacing is done when constants are used. In case of consumable constants (which are tokens or parselets using tokens in some way (by language design this is anything with an identifier starting with an upper-case letter or underline)) Tokay performs a closure algorithm to detect left-recursions within these structures.
    This is, for example in expr.tok the case, if you would write this in e.g. Python, you'll get an endless recursion because Expr calls Expr itself. To avoid this, the first stage detects all these structures. A generic parselet might result in a left-recursive structure, which depends on the compile-time constants fed to it.
  2. At runtime, you have the dynamic approach. In this case, left-recursions can't be detected. Surely you can express left-recursive structures with variables, but they will - likewise in other languages - fail because they run into an endless recursion.

I hope this clarifies things a bit.

@phorward
Copy link
Member Author

Commit fffeeda integrates the complete syntax for generic parselets and instances of generic parselets already. The generics feature is still pending, as further internal revisions are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request syntax
Projects
None yet
Development

No branches or pull requests

2 participants