Skip to content

Statically typed BNF with semantic actions; safe parser generator applicable to every programming language.

Notifications You must be signed in to change notification settings

thautwarm/Typed-BNF

Repository files navigation

Typed BNF

Backend-agnostic BNF grammar with type inference and semantic actions.

For IDE support, we provide a VSCode extension with the following features:

  • Semantic-based syntax highlighting
  • Go to definition/go to references/find all references

Documentation

Overview

So far, we support 3 different architectures, which unveil the capability of Typed BNF's backend agnostic code generation.

architecture backend(PGEN + PL) lexer Impl parser capability ADT encoding
antlr antlr4+csharp antlr ALL(*) case classes
antlr antlr4+typescript (default) antlr ALL(*) tagged unions
antlr antlr4+typescript (-be case-class) antlr ALL(*) case classs
*pure bnf pure bnf antlr notation CFG

(PL = programming language; PGEN = parser generator; pure bnf means it is the pure BNF for readable syntax specification )

You might check the following test scripts for detailed usage guide.

  • test-scripts/test_typescript_lua_tu.sh: Lua parser in TypeScript. (Algebraic) Data types are encoded using tagged unions.
  • test-scripts/test_typescript_lua.sh: Lua parser in TypeScript. (Algebraic) Data types are encoded using case classes.
  • test-scripts/test-csharp-lua.sh: Lua parser in CSharp.
  • test-scripts/test-csharp-json.sh: JSON parser in CSharp.

Note that a Lua (or JSON) parser generated for different programming languages comes from the same grammar.

Support for Python Lark & OCaml Menhir is deprecated since v0.4, see v0.3 for more.

Usage

Download the single executable file tbnf-VERSION-TARGET (e.g., tbnf-0.4.0-win-x64.exe, tbnf-0.4.0-osx-arm64) from the release page.

Usage: tbnf [options] <source-grammar-file>
Version: 0.4.2
Options:
  --version                 Show version and exit
  -h, --help                Show this help message and exit
  -o, --outDir DIR          Specify output directory (default: same as source file)
  -be, --backend TYPE       Backend to use
     Possible TYPE values:
       csharp-antlr         C# backend using ANTLR
       typescript-antlr     TypeScript backend using ANTLR
       pure-bnf             PureBNF backend

  -ae, --adt-encoding TYPE  ADT encoding
     Possible TYPE values:
       tagged-union         ADT encoding via tagged unions (default for TypeScript)
       case-class           ADT encoding via case classes (default for C#)
  -lang, --language NAME    Language name to generate (default: "mylang")
  -conf, --config PATH      Path to the 'tbnf.config.js' file (default: <outDir>/tbnf.config.js)

Examples:
  tbnf -lang mylanguage mygrammar.tbnf -be typescript-antlr -ae tagged-union
  tbnf -lang mylanguage mygrammar.tbnf -be csharp-antlr -conf tbnf.config.js

You might check out Typed BNF Documentations.

For TypeScript backends, you will also antlr-ng compiler and antlr4ng runtime.

For non-TypeScript backends, you will also need antlr4 command line tool, install it from https://github.com/antlr/antlr4-tools.

A basic example: JSON

The following grammar compiles and runs for programming languages and parser architectures supported by TBNF.

extern var parseInt : str -> int
extern var parseFlt : str -> float
extern var getStr : token -> str
extern var unesc : str -> str
extern var appendList : <a> (list<a>, a) -> list<a>

type Json
type JsonPair(name: str, value: Json)

case JInt : int -> Json
case JFlt : float -> Json
case JStr : str -> Json
case JNull : () -> Json
case JList : (elements: list<Json>) -> Json
case JDict : list<JsonPair> -> Json
case JBool : bool -> Json

ignore space

digit = [0-9] ;

start : json { $1 }

int = digit+ ;
float = digit* "." int ;
str = "\"" ( "\\" _ | ! "\"" )* "\"" ;
space = ("\t" | "\n" | "\r" | " ")+;

seplist(sep, elt) : elt { [$1] }
                  | seplist(sep, elt) sep elt
                    { appendList($1, $3) }

jsonpair : <str> ":" json { JsonPair(unesc(getStr($1)), $3) }

/* CPP comments */

json : <int> { JInt(parseInt(getStr($1))) }
      | <float> { JFlt(parseFlt(getStr($1))) }
      | "null"  { JNull() }
      | <str>   { JStr(unesc(getStr($1))) }
      | "[" "]" { JList([]) }
      | "{" "}" { JDict([]) }
      | "true"  { JBool(true) }
      | "false"  { JBool(false) }
      | "[" seplist(",", json) "]" { JList($2) }
      | "{" seplist(",", jsonpair) "}" { JDict($2) }

Customizing name mapping

You can put a tbnf.config.js in the output directory to define how the names of variables/types/fields/constructors map from Typed BNF to the backend language.

For instance, this is what we did for CSharp-Antlr4 JSON example: link.

"use strict";

function rename_type(x) {
  if (x == "list") return "Array";
  if (x == "int") return "number";
  if (x == "float") return "number";
  if (x == "str") return "string";
  if (x == "bool") return "boolean";
  if (x == "token") return "antlr.Token";
  return x + "_t";
}


module.exports = {
  rename_type
};

Key points:

  1. Typed BNF has 7 built-in types: token, tuple, list, int, float, str and bool.
  2. Typed BNF ships with no built-in functions, which makes it suitable to write portable grammars without ruling out semantic actions.

How to write new backends

Check out Backends.*.fs

Build from source

Prerequisites

  • .NET 8.0 SDK
  • Deno
  • Antlr4 (for non-TypeScript backends)
  • Antlr4NG & Antlr-NG (for TypeScript backends)

Build Distributions (win-x64/osx-x64/osx-arm64/linux-x64/linux-arm64)

deno run -A build.ts build

All distributions are built into the dist folder.

> ls -lhp dist | grep -v /$ | awk '{print $5 "\t" $9}'
70M     TBNF.CLI.exe
44K     TBNF.CLI.pdb
132K    TBNF.Core.pdb
75M     tbnf-0.4.0-linux-arm64
69M     tbnf-0.4.0-linux-x64
75M     tbnf-0.4.0-osx-arm64
69M     tbnf-0.4.0-osx-x64
70M     tbnf-0.4.0-win-x64.exe

Bootstrap

The grammar for Typed BNF is also implemented using Typed BNF.

deno run -A build.ts bootstrap-once

About

Statically typed BNF with semantic actions; safe parser generator applicable to every programming language.

Resources

Stars

Watchers

Forks

Packages

No packages published