Backend-agnostic BNF grammar with type inference and semantic actions.
For IDE support, we provide a VSCode extension with the following features:
- Semantic-based syntax highlighting
- Go to definition/go to references/find all references
So far, we support 3 different architectures, which unveil the capability of Typed BNF's backend agnostic code generation.
architecture | backend(PGEN + PL) | lexer Impl | parser capability | ADT encoding |
---|---|---|---|---|
antlr | antlr4+csharp | antlr | ALL(*) | case classes |
antlr | antlr4+typescript (default) | antlr | ALL(*) | tagged unions |
antlr | antlr4+typescript (-be case-class ) |
antlr | ALL(*) | case classs |
*pure bnf | pure bnf | antlr notation | CFG |
(PL = programming language; PGEN = parser generator; pure bnf means it is the pure BNF for readable syntax specification )
You might check the following test scripts for detailed usage guide.
test-scripts/test_typescript_lua_tu.sh
: Lua parser in TypeScript. (Algebraic) Data types are encoded using tagged unions.test-scripts/test_typescript_lua.sh
: Lua parser in TypeScript. (Algebraic) Data types are encoded using case classes.test-scripts/test-csharp-lua.sh
: Lua parser in CSharp.test-scripts/test-csharp-json.sh
: JSON parser in CSharp.
Note that a Lua (or JSON) parser generated for different programming languages comes from the same grammar.
Support for Python Lark & OCaml Menhir is deprecated since v0.4, see v0.3 for more.
Download the single executable file tbnf-VERSION-TARGET
(e.g., tbnf-0.4.0-win-x64.exe
, tbnf-0.4.0-osx-arm64
) from the release page.
Usage: tbnf [options] <source-grammar-file>
Version: 0.4.2
Options:
--version Show version and exit
-h, --help Show this help message and exit
-o, --outDir DIR Specify output directory (default: same as source file)
-be, --backend TYPE Backend to use
Possible TYPE values:
csharp-antlr C# backend using ANTLR
typescript-antlr TypeScript backend using ANTLR
pure-bnf PureBNF backend
-ae, --adt-encoding TYPE ADT encoding
Possible TYPE values:
tagged-union ADT encoding via tagged unions (default for TypeScript)
case-class ADT encoding via case classes (default for C#)
-lang, --language NAME Language name to generate (default: "mylang")
-conf, --config PATH Path to the 'tbnf.config.js' file (default: <outDir>/tbnf.config.js)
Examples:
tbnf -lang mylanguage mygrammar.tbnf -be typescript-antlr -ae tagged-union
tbnf -lang mylanguage mygrammar.tbnf -be csharp-antlr -conf tbnf.config.js
You might check out Typed BNF Documentations.
For TypeScript backends, you will also antlr-ng compiler and antlr4ng runtime.
For non-TypeScript backends, you will also need antlr4
command line tool, install it from https://github.com/antlr/antlr4-tools
.
The following grammar compiles and runs for programming languages and parser architectures supported by TBNF.
extern var parseInt : str -> int
extern var parseFlt : str -> float
extern var getStr : token -> str
extern var unesc : str -> str
extern var appendList : <a> (list<a>, a) -> list<a>
type Json
type JsonPair(name: str, value: Json)
case JInt : int -> Json
case JFlt : float -> Json
case JStr : str -> Json
case JNull : () -> Json
case JList : (elements: list<Json>) -> Json
case JDict : list<JsonPair> -> Json
case JBool : bool -> Json
ignore space
digit = [0-9] ;
start : json { $1 }
int = digit+ ;
float = digit* "." int ;
str = "\"" ( "\\" _ | ! "\"" )* "\"" ;
space = ("\t" | "\n" | "\r" | " ")+;
seplist(sep, elt) : elt { [$1] }
| seplist(sep, elt) sep elt
{ appendList($1, $3) }
jsonpair : <str> ":" json { JsonPair(unesc(getStr($1)), $3) }
/* CPP comments */
json : <int> { JInt(parseInt(getStr($1))) }
| <float> { JFlt(parseFlt(getStr($1))) }
| "null" { JNull() }
| <str> { JStr(unesc(getStr($1))) }
| "[" "]" { JList([]) }
| "{" "}" { JDict([]) }
| "true" { JBool(true) }
| "false" { JBool(false) }
| "[" seplist(",", json) "]" { JList($2) }
| "{" seplist(",", jsonpair) "}" { JDict($2) }
You can put a tbnf.config.js
in the output directory to define how the names of variables/types/fields/constructors map from Typed BNF to the backend language.
For instance, this is what we did for CSharp-Antlr4 JSON example: link.
"use strict";
function rename_type(x) {
if (x == "list") return "Array";
if (x == "int") return "number";
if (x == "float") return "number";
if (x == "str") return "string";
if (x == "bool") return "boolean";
if (x == "token") return "antlr.Token";
return x + "_t";
}
module.exports = {
rename_type
};
Key points:
- Typed BNF has 7 built-in types:
token
,tuple
,list
,int
,float
,str
andbool
. - Typed BNF ships with no built-in functions, which makes it suitable to write portable grammars without ruling out semantic actions.
Check out Backends.*.fs
- .NET 8.0 SDK
- Deno
- Antlr4 (for non-TypeScript backends)
- Antlr4NG & Antlr-NG (for TypeScript backends)
deno run -A build.ts build
All distributions are built into the dist
folder.
> ls -lhp dist | grep -v /$ | awk '{print $5 "\t" $9}'
70M TBNF.CLI.exe
44K TBNF.CLI.pdb
132K TBNF.Core.pdb
75M tbnf-0.4.0-linux-arm64
69M tbnf-0.4.0-linux-x64
75M tbnf-0.4.0-osx-arm64
69M tbnf-0.4.0-osx-x64
70M tbnf-0.4.0-win-x64.exe
The grammar for Typed BNF is also implemented using Typed BNF.
deno run -A build.ts bootstrap-once