copyright |
---|
Copyright (c) Runtime Verification, Inc. All Rights Reserved. |
Defined below is a series of modules used to parse inner syntax in K (ie, the contents of rules, configuration declarations, and contexts).
Much of this file exists in tight correspondence with the K implementation, and K will not work correctly if it is altered without corresponding changes to the source code of the K tools.
Users should only import a few modules from this file. In particular, this
includes SORT-K
, BASIC-K
, ML-SYNTAX
, DEFAULT-LAYOUT
,
DEFAULT-CONFIGURATION
, and K-AMBIGUITIES
. The remaining modules should not
be imported by the user; they are used implicitly by the implementation of K.
The SORT-K
module declares the K
sort, and nothing else.
module SORT-K
syntax K [hook(K.K)]
endmodule
The BASIC-K
module declares the K
, KItem
, and KConfigVar
sorts, and
imports the syntax of matching logic.
module BASIC-K
imports ML-SYNTAX
imports SORT-BOOL
syntax KItem [hook(K.KItem)]
syntax K ::= KItem
syntax KConfigVar [token]
syntax KItem ::= KConfigVar
endmodule
Below is defined the abstract syntax of concrete terms in K, the KAST
syntax.
Users should rarely if ever have to refer to this syntax; in general, it
suffices to use concrete syntax in rules, configuration declarations, contexts,
etc.
This syntax is used directly by the K implementation, and exists here as a
reference for the syntax of KAST
, but it should not be imported directly by
the user.
module KSTRING
syntax KString ::= r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]" [token]
// optionally qualified strings, like in Scala "abc", i"abc", r"a*bc", etc.
endmodule
module BUILTIN-ID-TOKENS
syntax #LowerId ::= r"[a-z][a-zA-Z0-9]*" [prec(2), token]
syntax #UpperId ::= r"[A-Z][a-zA-Z0-9]*" [prec(2), token]
endmodule
module SORT-KBOTT
imports SORT-K
syntax KBott
endmodule
module KAST
imports BASIC-K
imports SORT-KBOTT
imports KSTRING
imports BUILTIN-ID-TOKENS
syntax KBott ::= "#token" "(" KString "," KString ")" [symbol(#KToken)]
| "#klabel" "(" KLabel ")" [symbol(#WrappedKLabel)]
| KLabel "(" KList ")" [symbol(#KApply)]
syntax KItem ::= KBott
syntax KLabel ::= r"`(\\\\`|\\\\\\\\|[^`\\\\\\n\\r])+`" [token]
| #LowerId [token]
| r"[#a-z][a-zA-Z0-9]*" [token, prec(1)]
syntax KList ::= K
| ".KList" [symbol(#EmptyKList)]
| KList "," KList [symbol(#KList), left, assoc, unit(#EmptyKList), prefer]
endmodule
// To be used when parsing/pretty-printing ground configurations
module KSEQ
imports KAST
imports K-TOP-SORT
syntax K ::= ".K" [symbol(#EmptyK)]
| "." [symbol(#EmptyK), deprecated, unparseAvoid]
syntax K ::= K "~>" K [symbol(#KSequence), left, assoc, unit(#EmptyK)]
syntax left #KSequence
syntax {Sort} Sort ::= "(" Sort ")" [bracket, group(defaultBracket), applyPriority(1)]
endmodule
K provides direct access to the symbols of Matching Logic, while giving them
their own concrete syntax distinct from the syntax of the KORE
intermediate
representation. These symbols are primarily used during symbolic execution.
The LLVM Backend has relatively little understanding of Matching Logic directly
and use of these symbols directly in rules is likely to cause it to crash.
However, these symbols are necessary when providing lemmas and other types of
logical assistance to proofs and symbolic execution in the Haskell Backend.
The correspondance between K symbols and KORE symbols is as follows:
#Top
-\top
#Bottom
-\bottom
#Not
-\not
#Ceil
-\ceil
#Floor
-\floor
#Equals
-\equals
#And
-\and
#Or
-\or
#Implies
-\implies
#Exists
-\exists
#Forall
-\forall
#AG
-allPathGlobally
#wEF
-weakExistsFinally
#wAF
-weakAlwaysFinally
module ML-SYNTAX [not-lr1]
imports SORT-K
syntax {Sort} Sort ::= "#Top" [symbol(#Top), group(mlUnary)]
| "#Bottom" [symbol(#Bottom), group(mlUnary)]
| "#Not" "(" Sort ")" [symbol(#Not), mlOp, group(mlUnary, mlOp)]
syntax {Sort1, Sort2} Sort2 ::= "#Ceil" "(" Sort1 ")" [symbol(#Ceil), mlOp, group(mlUnary, mlOp)]
| "#Floor" "(" Sort1 ")" [symbol(#Floor), mlOp, group(mlUnary, mlOp)]
| "{" Sort1 "#Equals" Sort1 "}" [symbol(#Equals), mlOp, group(mlEquals, mlOp), comm, format(%1%i%n%2%d%n%3%i%n%4%d%n%5)]
syntax priority mlUnary > mlEquals > mlAnd
syntax {Sort} Sort ::= Sort "#And" Sort [symbol(#And), assoc, left, comm, unit(#Top), mlOp, group(mlAnd, mlOp), format(%i%1%d%n%2%n%i%3%d)]
> Sort "#Or" Sort [symbol(#Or), assoc, left, comm, unit(#Bottom), mlOp, group(mlOp), format(%i%1%d%n%2%n%i%3%d)]
> Sort "#Implies" Sort [symbol(#Implies), mlOp, group(mlImplies, mlOp), format(%i%1%d%n%2%n%i%3%d)]
syntax priority mlImplies > mlQuantifier
syntax {Sort1, Sort2} Sort2 ::= "#Exists" Sort1 "." Sort2 [symbol(#Exists), mlOp, mlBinder, group(mlQuantifier, mlOp)]
| "#Forall" Sort1 "." Sort2 [symbol(#Forall), mlOp, mlBinder, group(mlQuantifier, mlOp)]
syntax {Sort} Sort ::= "#AG" "(" Sort ")" [symbol(#AG), mlOp, group(mlOp)]
| "#wEF" "(" Sort ")" [symbol(weakExistsFinally), mlOp, group(mlOp)]
| "#wAF" "(" Sort ")" [symbol(weakAlwaysFinally), mlOp, group(mlOp)]
endmodule
Provided below is the syntax of variables in K. There are four types of variables in K:
- Regular variables. These are denoted by variables that begin with an underscore or a capital letter. These variables match exactly one value and can be used to refer to it on the right-hand-side.
- Fresh constants. These are denoted by variables that begin with an
!
. This is a convenience syntax which can be used on the right-hand-side only, and refer to a unique value of the specified sort which is distinct from any other value that has been generated or will be generated by the!X
syntax. Note that this may not be distinct from values produced via other means. - Existential variables. This refers to variables that are existentially
quantified and begin with a
?
. They are not required to appear on the left-hand-side prior to appearing on the right-hand-side, and generally refer to symbolic quantities that are introduced during rewriting. Refer to K's documentation for more details. - Set variables. These are denoted by variables that begin with a
@
. These variables refer to a set of values and are generally used when writing simplification rules in the Haskell Backend. For more information, refer to K's documentation.
There is also a fifth type of "variable", although it is not technically a
variable. This refers to configuration variables, which are used to insert
values into the initial configuration that come from outside the semantics.
The most common of these is the $PGM
variable, which conventionally contains
the program being executed and is placed in the <k>
cell in the configuration
declaration. These "variables" begin with a $
and their values are populated
by the frontend prior to symbolic or concrete execution of a program.
module KVARIABLE-SYNTAX
syntax #KVariable
endmodule
// To be used when parsing/pretty-printing symbolic configurations
module KSEQ-SYMBOLIC
imports KSEQ
imports ML-SYNTAX
imports KVARIABLE-SYNTAX
syntax #KVariable ::= r"(\\!|\\?|@)?([A-Z][A-Za-z0-9'_]*|_|_[A-Z][A-Za-z0-9'_]*)" [token, prec(1)]
| #UpperId [token]
syntax KConfigVar ::= r"(\\$)([A-Z][A-Za-z0-9'_]*)" [token]
syntax KBott ::= #KVariable
syntax KBott ::= KConfigVar
endmodule
While the backend treats cells as regular productions like any other, the frontend provides a significant amount of convenience notation for dealing with groups of cells, in order to make writing modular definitions easier. As a result, we need a syntax for groups of cells and for referring to cells within rules, configuration declarations, and functions.
For historical reasons, the Bag
sort is used to refer to groups of cells.
This may change in a future release. Users can combine cells in any order
by concatenating them together, and can refer to the absence of any cells with
the .Bag
symbol. You can also refer to cells within a function by placing
the cell context symbol, [[ K ]]
at the top of a rule, placing a function
symbol inside, and referring to cells afterwards. This implicitly inserts
a reference to the configuration at the time prior to the currently-applied
rule being applied which can be matched on within the function. Functions with
such context cannot be referred to in the initial configuration, because the
prior configuration does not yet exist.
module KCELLS
imports KAST
syntax Cell
syntax Bag ::= Bag Bag [left, assoc, symbol(#cells), unit(#cells)]
| ".Bag" [symbol(#cells)]
| ".::Bag" [symbol(#cells)]
| Cell
syntax Bag ::= "(" Bag ")" [bracket]
syntax KItem ::= Bag
syntax #RuleBody ::= "[" "[" K "]" "]" Bag [symbol(#withConfig), avoid]
syntax non-assoc #withConfig
syntax Bag ::= KBott
endmodule
Users can also refer to cells in rules. When doing so, an optional ...
can
be placed immediately after the start of the cell or immediately before the
end. In a cell whose contents are commutative, these are equivalent to one
another and are also equivalent to placing ...
in both places. This means
that what is placed in the cell will be combined with the cell contents'
concatenation operator with an unnamed variable. In other words, you match on
some number of elements in the collection and do not care about the rest of
the collection.
In a cell whose contents are not commutative, the ...
operators correspond
to a variable on the respective side of the contents of the cell that the
...
appears. For example, <foo>... L </foo>
, if L
is a list, means
some number of elements followed by L. Note that not all combinations are
supported. Cells whose contents are sort K
can only have ...
appear at the
tail of the cell, and cells whose contents are sort List
can only have ...
appear on at most one side in a single rule.
module RULE-CELLS
imports KCELLS
imports RULE-LISTS
// if this module is imported, the parser automatically
// generates, for all productions that have the attribute 'cell' or 'maincell',
// a production like below:
//syntax Cell ::= "<top>" #OptionalDots K #OptionalDots "</top>" [symbol(<top>)]
syntax #OptionalDots ::= "..." [symbol(#dots)]
| "" [symbol(#noDots)]
syntax Int
// this production will be added by the compiler to help handle bang variables,
// however, it is valuable to put it here because without this production, it
// will not exist at the point in time when rules and claims are parsed, and
// as a result it makes it very difficult to write proof claims over fragments
// of code that exercise rules containing bang variables. We put it here because
// this production will "vanish" after parsing finishes and not be picked up
// by the compiler, which is the behavior we want in this case since an actual
// production will be generated by the compiler later on.
syntax GeneratedCounterCell ::= "<generatedCounter>" Int "</generatedCounter>" [cell, symbol(<generatedCounter>), internal]
endmodule
Users can also declare cells in a configuration declaration. This generates a specific set of productions that is used internally to implement the cell. The most important of these is the cell itself, and attributes on this production can be specified in an xml-attribute-like syntax.
You can also use an xml-short-tag-like syntax to compose configuration cells together which were defined in different modules. However, it is a requirement that any K definition have at most one fully-composed configuration; thus, all other configuration declarations must appear composed within another configuration declaration.
module CONFIG-CELLS
imports KCELLS
imports RULE-LISTS
syntax #CellName ::= r"[a-zA-Z][a-zA-Z0-9\\-]*" [token, prec(1)]
| #LowerId [token]
| #UpperId [token]
syntax Cell ::= "<" #CellName #CellProperties ">" K "</" #CellName ">" [symbol(#configCell)]
syntax Cell ::= "<" #CellName "/>" [symbol(#externalCell)]
syntax #CellProperties ::= #CellProperty #CellProperties [symbol(#cellPropertyList)]
| "" [symbol(#cellPropertyListTerminator)]
syntax #CellProperty ::= #CellName "=" KString [symbol(#cellProperty)]
endmodule
Rules can have an optional requires clause or an ensures clause. For backwards-
compatibility, you can refer to the requires clause with both the requires
and when
keywords; The latter, however, is deprecated and may be removed in
a future release.
The requires clause specifies the preconditions that must be true in order
for the rule to apply. The ensures clause specifies the information which
becomes true after the rule has applied. It is a requirement that information
present in the ensures
clause refer to existential variables only.
When doing concrete execution, you can think of the requires
clause as a
side-condition. In other words, even if the rule matches, it will not apply
unless the requires
clause, which must be of sort Bool
, evaluates to
true
.
module REQUIRES-ENSURES
imports BASIC-K
syntax #RuleBody ::= K
syntax #RuleContent ::= #RuleBody [symbol("#ruleNoConditions")]
| #RuleBody "requires" Bool [symbol("#ruleRequires")]
| #RuleBody "ensures" Bool [symbol("#ruleEnsures")]
| #RuleBody "requires" Bool "ensures" Bool [symbol("#ruleRequiresEnsures")]
endmodule
The below modules are used in various ways as indicators to the implementation that certain automatically generated syntax should be created by the parser. These modules should not be imported directly by the user.
module K-TOP-SORT
imports SORT-KBOTT
syntax KItem ::= KBott
syntax {Sort} KItem ::= Sort
endmodule
module K-BOTTOM-SORT
imports SORT-KBOTT
syntax KItem ::= KBott
syntax {Sort} Sort ::= KBott
endmodule
module K-SORT-LATTICE
imports K-TOP-SORT
imports K-BOTTOM-SORT
endmodule
module AUTO-CASTS
// if this module is imported, the parser automatically
// generates, for all sorts, productions of the form:
// Sort ::= Sort ":Sort" // semantic cast - force the inner term to be `Sort` or a subsort
// Sort ::= Sort "::Sort" // strict cast - force the inner term to be exactly `Sort`. Useful for disambiguation
// Sort ::= "{" Sort "}" "::Sort" // synonym for strict cast
// Sort ::= "{" K "}" ":>Sort" // projection cast. Allows any term to be placed in a context that expects `Sort`
// this is part of the mechanism that allows concrete user syntax in K
endmodule
module AUTO-FOLLOW
// if this module is imported, the parser automatically
// generates a follow restriction for every terminal which is a prefix
// of another terminal. This is useful to prevent ambiguities such as:
// syntax K ::= "a"
// syntax K ::= "b"
// syntax K ::= "ab"
// syntax K ::= K K
// #parse("ab", "K")
// In the above example, the terminal "a" is not allowed to be followed by a "b"
// because it would turn the terminal into the terminal "ab".
endmodule
module PROGRAM-LISTS
imports SORT-K
// if this module is imported, the parser automatically
// replaces the default productions for lists:
// Es ::= E "," Es [userList("*"), symbol('_,_)]
// | ".Es" [userList("*"), symbol('.Es)]
// into a series of productions more suitable for programs:
// Es#Terminator ::= "" [symbol('.Es)]
// Ne#Es ::= E "," Ne#Es [symbol('_,_)]
// | E Es#Terminator [symbol('_,_)]
// Es ::= Ne#Es
// | Es#Terminator // if the list is *
endmodule
module RULE-LISTS
// if this module is imported, the parser automatically
// adds the subsort production to the parsing module only:
// Es ::= E [userList("*")]
endmodule
module RECORD-PRODUCTIONS
// if this module is imported, prefix productions of the form
// syntax Sort ::= name(Args)
// will be able to be parsed with don't-care variables according
// to their nonterminal's names
endmodule
module SORT-PREDICATES
// if this module is imported, the Bool sort will be annotated with
// syntax Bool ::= isSort(K) [function]
// and all sorts will be annotated with
// syntax Sort ::= project:Sort(K) [function]
endmodule
Certain additional features are available when parsing the contents of rules and contexts. For more information on each of these, refer to K's documentation.
module KREWRITE
syntax {Sort} Sort ::= Sort "=>" Sort [symbol(#KRewrite)]
syntax non-assoc #KRewrite
syntax priority #KRewrite > #withConfig
endmodule
// To be used to parse semantic rules
module K
imports KSEQ-SYMBOLIC
imports REQUIRES-ENSURES
imports RECORD-PRODUCTIONS
imports SORT-PREDICATES
imports K-SORT-LATTICE
imports AUTO-CASTS
imports AUTO-FOLLOW
imports KREWRITE
syntax {Sort} Sort ::= Sort "#as" Sort [symbol(#KAs)]
// functions that preserve sorts and can therefore have inner rewrites
syntax {Sort} Sort ::= "#fun" "(" Sort ")" "(" Sort ")" [symbol(#fun2), prefer]
// functions that do not preserve sort and therefore cannot have inner rewrites
syntax {Sort1, Sort2} Sort1 ::= "#fun" "(" Sort2 "=>" Sort1 ")" "(" Sort2 ")" [symbol(#fun3)]
syntax {Sort1, Sort2} Sort1 ::= "#let" Sort2 "=" Sort2 "#in" Sort1 [symbol(#let)]
/*@ Set membership over terms. In addition to equality over
concrete patterns, K also supports computing equality
between a concrete pattern and a symbolic pattern.
This is compiled efficiently down to pattern matching,
and can be used by putting a term with unbound variables
in the left child of :=K or =/=K. Note that this does not
bind variables used on the lhs however (although this may
change in the future).*/
syntax Bool ::= left:
K ":=K" K [function, total, symbol(_:=K_), group(equalEqualK)]
| K ":/=K" K [function, total, symbol(_:/=K_), group(notEqualEqualK)]
endmodule
// To be used to parse terms in full K
module K-TERM
imports KSEQ-SYMBOLIC
imports RECORD-PRODUCTIONS
imports SORT-PREDICATES
imports K-SORT-LATTICE
imports AUTO-CASTS
imports AUTO-FOLLOW
imports KREWRITE
endmodule
When constructing a scanner for use during parsing, often you wish to ignore
certain types of text, such as whitespace and comments. However, the specific
syntax which each language must ignore is a little different from language
to language, and thus you wish to specify it manually. You can do this by
defining productions of the #Layout
sort. For more information, refer to
K's documentation. However, this module will be implicitly imported if no
productions are declared of sort #Layout
. This module will also be used
for the purposes of parsing K rules. If you wish to declare a language with
no layout productions, simply create a sort declaration for the #Layout
sort
in your code (e.g. syntax #Layout
).
module DEFAULT-LAYOUT
syntax #Layout ::= r"(\\/\\*([^\\*]|(\\*+([^\\*\\/])))*\\*+\\/)" // C-style multi-line comments
| r"(\\/\\/[^\\n\\r]*)" // C-style single-line comments
| r"([\\ \\n\\r\\t])" // Whitespace
endmodule
If the user has no configuration declaration in their seamantics, the below configuration declaration will be implicitly imported.
module DEFAULT-CONFIGURATION
imports BASIC-K
configuration <k> $PGM:K </k>
endmodule
On occasion, it may be desirable to parse a language with an ambiguous grammar
when parsing a program, and perform additional semantic analysis at a later
time in order to resolve the ambiguities. A good example of this is as a
substitute for the lexer hack in parsers of the C
programming language.
The following module contains a declaration for ambiguities in K. Usually,
an ambiguous parse is an error. However, when you use the --gen-glr-parser
flag to kast
, or the --gen-glr-bison-parser
flag to kompile
, ambiguities
instead become instances of the below parametric production, which you can use
regular K rules to disambiguate as necessary.
module K-AMBIGUITIES
syntax {Sort} Sort ::= amb(Sort, Sort) [symbol(amb)]
endmodule
Another feature of K's Bison parser is the ability to annotate terms parsed with location information about the file and line where they occurred. For more information about how to use this, refer to K's documentation. However, the below module exists to provide a user syntax for the annotations that are generated by the parser.
module K-LOCATIONS
imports STRING-SYNTAX
imports INT-SYNTAX
// filename, startLine, startCol, endLine, endCol
syntax {Sort} Sort ::= #location(Sort, String, Int, Int, Int, Int) [symbol(#location), format(%3)]
endmodule