From 8f150fecdea35c1215871b66548fdff2c8907fd6 Mon Sep 17 00:00:00 2001 From: rina Date: Mon, 2 Sep 2024 17:35:08 +1000 Subject: [PATCH 1/5] add docs --- docs/offline.md | 1 + docs/online.md | 40 ++++++++++++++++++++++++++++++++++++++++ docs/readme.md | 14 ++++++++++++++ docs/testing.md | 7 +++++++ 4 files changed, 62 insertions(+) create mode 100644 docs/offline.md create mode 100644 docs/online.md create mode 100644 docs/readme.md create mode 100644 docs/testing.md diff --git a/docs/offline.md b/docs/offline.md new file mode 100644 index 00000000..e4f60caa --- /dev/null +++ b/docs/offline.md @@ -0,0 +1 @@ +# Offline partial evaluation diff --git a/docs/online.md b/docs/online.md new file mode 100644 index 00000000..314a24f9 --- /dev/null +++ b/docs/online.md @@ -0,0 +1,40 @@ +# Online partial evaluation + +Online partial evaluation traverses the specification, +closely mirroring concrete evaluation, +and specialises expressions as it encounters them. +It does this "online" by recording, +for each variable, +whether that variable is a concrete, symbolic, or unknown value +(in order of simplification potential). +Concrete values are simple literals which can almost always be immediately evaluated +in expressions. +Pure symbolic expressions are pure transformations of a combination +of concrete and symbolic/unknown values. +For example, this includes "(x + 1) - 1" and this can be used to simplify +using algebraic properties of integers and bitvectors. +The last type, unknown, is any computation which is opaque at partial-evaluation time. +Most often, this is register and memory accesses. +When used in an expression, this will emit code to perform the computation and store +it into a read-only variable, transforming it into a pure symbolic expression. + +The overall flow within ASLp is: +- symbolic +- dis +- transforms + +In the implementation, this begins at the [symbolic.ml](/libASL/symbolic.ml) +file which decribes expression-level symbolic operations. +Here, the type of `sym` is defined simply as either +a concrete value or a pure symbolic expression +(the concept of "unknown" values lives outside this file). +Essentially, this file implements symbolic analogues +of concrete value computations ([value.ml][/libASL/value.ml]) +and primitive operations ([primops.ml][/libASL/primops.ml]). +In doing so, it performs the expression-level simplification whenever +the structure of the pure expression allows. Some notable instances are: +- identities of bitwise "and" and "or" and integer "mul", "add", and "sub", +- slices of slices, +- (sign/zero) extension of extensions. + + diff --git a/docs/readme.md b/docs/readme.md new file mode 100644 index 00000000..2a051c1c --- /dev/null +++ b/docs/readme.md @@ -0,0 +1,14 @@ +# ASLp documentation + +This folder will provide documentation for ASLp, +primarily of interest to developers of ASLp and +people seeking to integrate ASLp. + +ASLp is a partial evaluator for the ARM specification language. +The most established functionality is in the [online partial evaluation](./online.md). +The new [offline partial evaluation](./offline.md), building on the online features, +is available for easier integration with projects. + +The project is validated by a number of [tests](./testing.md) which are run +on pushes and merge requests to the repository. + diff --git a/docs/testing.md b/docs/testing.md new file mode 100644 index 00000000..4dcdc515 --- /dev/null +++ b/docs/testing.md @@ -0,0 +1,7 @@ +# Testing infrastructure + +test files are stored within the [tests/](/tests) directory. + +- unit testing (very limited) +- differential tests within coverage folder (bespoke infrastructure to implement pipeline, then dune diff) +- cram expect tests ending in .t (golden tests, also dune based) From 0d8898a6807e65f3d382734a5ad44b91dc6a3fa3 Mon Sep 17 00:00:00 2001 From: rina Date: Thu, 5 Sep 2024 15:42:02 +1000 Subject: [PATCH 2/5] dis docs --- docs/online.md | 50 ++++++++++++++++++++++++++++++++++++++++++---- docs/vectoriser.md | 0 2 files changed, 46 insertions(+), 4 deletions(-) create mode 100644 docs/vectoriser.md diff --git a/docs/online.md b/docs/online.md index 314a24f9..4f253b28 100644 --- a/docs/online.md +++ b/docs/online.md @@ -1,5 +1,7 @@ # Online partial evaluation +## Background + Online partial evaluation traverses the specification, closely mirroring concrete evaluation, and specialises expressions as it encounters them. @@ -18,10 +20,15 @@ Most often, this is register and memory accesses. When used in an expression, this will emit code to perform the computation and store it into a read-only variable, transforming it into a pure symbolic expression. -The overall flow within ASLp is: -- symbolic -- dis -- transforms +## Implementation + +Within ASLp, the partial evaluation features are implemented by three files +with different responsibilities. +These are: symbolic.ml, dis.ml, and transforms.ml. +This is also the order from lower-level concepts (expressions) +to higher-level ones (statements and programs). + +### Symbolic In the implementation, this begins at the [symbolic.ml](/libASL/symbolic.ml) file which decribes expression-level symbolic operations. @@ -37,4 +44,39 @@ the structure of the pure expression allows. Some notable instances are: - slices of slices, - (sign/zero) extension of extensions. +### Dis + +[dis.ml](/libASL/dis.ml) performs the recursive traversal of the ASL AST, +calling functions from symbolic.ml as needed. +This file is responsible for inter-statement and inter-procedural effects, +including the collection of the residual program. + +The structure in this file is designed to mirror that of the interpreter's +evaluation methods ([eval.ml](/libASL/eval.ml)). +For instance, there are corresponding functions for expressions: +```ocaml +val dis_expr : AST.l -> AST.expr -> sym rws +val eval_expr : AST.l -> Env.t -> AST.expr -> value +``` +Here, the _'a rws_ is reader-writer-state monad: +``` +type 'a rws = Eval.Env.t -> LocalEnv.t -> ('a, LocalEnv.t, stmt list) +``` +which implements an abstraction for: +- a read-only view of globals and constants ("eval env"), +- a mutable state with a mapping of variables to their concrete value or pure symbolic expression ("local env"), and +- a write-only list of statements making up the residual program. +This mechanism is used in the code inside the `let@` and `let+` syntaxes. + +### Transforms + + +## Grammar + +## Supporting + +- asl_ast (auto-generated) +- monads +- asl\_utils +- utils diff --git a/docs/vectoriser.md b/docs/vectoriser.md new file mode 100644 index 00000000..e69de29b From 01bf590572cc16e3167215839e458dba906b18d6 Mon Sep 17 00:00:00 2001 From: rina Date: Thu, 5 Sep 2024 17:41:56 +1000 Subject: [PATCH 3/5] docs transform --- docs/online.md | 57 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 12 deletions(-) diff --git a/docs/online.md b/docs/online.md index 4f253b28..05bd43c0 100644 --- a/docs/online.md +++ b/docs/online.md @@ -28,25 +28,31 @@ These are: symbolic.ml, dis.ml, and transforms.ml. This is also the order from lower-level concepts (expressions) to higher-level ones (statements and programs). -### Symbolic +### symbolic.ml In the implementation, this begins at the [symbolic.ml](/libASL/symbolic.ml) file which decribes expression-level symbolic operations. Here, the type of `sym` is defined simply as either a concrete value or a pure symbolic expression (the concept of "unknown" values lives outside this file). -Essentially, this file implements symbolic analogues -of concrete value computations ([value.ml][/libASL/value.ml]) -and primitive operations ([primops.ml][/libASL/primops.ml]). -In doing so, it performs the expression-level simplification whenever -the structure of the pure expression allows. Some notable instances are: + +Moreover, this file implements symbolic analogues +of existing primitive operations on concrete values +([value.ml](/libASL/value.ml) and [primops.ml](/libASL/primops.ml])). +Essentially, these functions perform a concrete evaluation +if all (or sufficiently many) operands are concretely known, and +return a symbolic expression otherwise. +The expression may be simplified through the +expression-level simplification whenever +its structure allows. +Some notable instances are: - identities of bitwise "and" and "or" and integer "mul", "add", and "sub", - slices of slices, - (sign/zero) extension of extensions. -### Dis +### dis.ml -[dis.ml](/libASL/dis.ml) performs the recursive traversal of the ASL AST, +[dis.ml](/libASL/dis.ml) performs a recursive traversal of the ASL AST, calling functions from symbolic.ml as needed. This file is responsible for inter-statement and inter-procedural effects, including the collection of the residual program. @@ -58,17 +64,44 @@ For instance, there are corresponding functions for expressions: val dis_expr : AST.l -> AST.expr -> sym rws val eval_expr : AST.l -> Env.t -> AST.expr -> value ``` -Here, the _'a rws_ is reader-writer-state monad: +Here, the _'a rws_ return value is a computation inside the reader-writer-state monad: ``` type 'a rws = Eval.Env.t -> LocalEnv.t -> ('a, LocalEnv.t, stmt list) ``` -which implements an abstraction for: +This implements an abstraction for: - a read-only view of globals and constants ("eval env"), - a mutable state with a mapping of variables to their concrete value or pure symbolic expression ("local env"), and - a write-only list of statements making up the residual program. -This mechanism is used in the code inside the `let@` and `let+` syntaxes. -### Transforms +This mechanism is invoked in the `let@` and `let+` syntaxes +which composes RWS computations in a sensible way (explained more below, if desired). + +The entry-point to the entire partial evaluator is the _dis\_core_ method. +This invokes the appropriate "dis" functions then performs a number of post-processing +transformations, described next. + +### transforms.ml + +[transforms.ml](/libASL/transforms.ml) implements transforms which are executed after the +main partial evaluation phase. +Some of these are conventional compiler transformations +(e.g. removing unused variables, common subexpression factoring, and copy-propagation). +The majority, however, are designed to address deficiencies +(verbosity or unsupported constructs) in the residual program. +We discuss a few transformations which are of particular interest. + +## StatefulIntToBits / IntToBits + +This transformation converts expressions involving arbitrary-precision integers +into expressions of fixed-length bitvectors +by way of an interval analysis. + +The StatefulIntToBits modules implements this through an abstract interpretation +with an abstract domain of $\text{Width}\times \text{IsSigned} \times \mathbb Z^2$ + + + + ## Grammar From 5dbf47aa78c27c0bcde9f02b52cb6315a34f2709 Mon Sep 17 00:00:00 2001 From: rina Date: Thu, 10 Oct 2024 15:14:58 +1000 Subject: [PATCH 4/5] finish online docs? --- docs/online.md | 99 ++++++++++++++++++++++++++++++++-------------- docs/vectoriser.md | 3 ++ 2 files changed, 73 insertions(+), 29 deletions(-) diff --git a/docs/online.md b/docs/online.md index 05bd43c0..fc49d436 100644 --- a/docs/online.md +++ b/docs/online.md @@ -64,21 +64,52 @@ For instance, there are corresponding functions for expressions: val dis_expr : AST.l -> AST.expr -> sym rws val eval_expr : AST.l -> Env.t -> AST.expr -> value ``` -Here, the _'a rws_ return value is a computation inside the reader-writer-state monad: -``` +Here, the _'a rws_ return value is a computation inside the reader-writer-state monad (discussed below). + +The entry-point to the entire partial evaluator is the _dis\_core_ method. +This invokes the appropriate "dis" functions then performs a number of post-processing +transformations, described next. + +#### LocalEnv / DisEnv + +The LocalEnv structure stores and manipulates location-sensitive information during the +partial evaluation. +Most importantly, this includes the stack +mappings of variables to their concrete value or pure symbolic expression. +To facilitate easier debugging, this also manually records a stack of the +AST nodes currently being evaluated +(the ordinary stack trace is not useful in the presence of the monad abstraction). + +The DisEnv contains the _'a rws_ type — a reader-writer-state monad: +```ocaml type 'a rws = Eval.Env.t -> LocalEnv.t -> ('a, LocalEnv.t, stmt list) ``` This implements an abstraction for: - a read-only view of globals and constants ("eval env"), -- a mutable state with a mapping of variables to their concrete value or pure symbolic expression ("local env"), and +- a mutable LocalEnv state ("local env"), and - a write-only list of statements making up the residual program. -This mechanism is invoked in the `let@` and `let+` syntaxes -which composes RWS computations in a sensible way (explained more below, if desired). - -The entry-point to the entire partial evaluator is the _dis\_core_ method. -This invokes the appropriate "dis" functions then performs a number of post-processing -transformations, described next. +This mechanism is invoked by the `let@` and `let+` syntaxes +which composes RWS computations in a sensible way. +As an example, with many type annotations, consider +```ocaml +val int_is_zero : int rws -> bool rws = + fun (x : int rws) -> + let@ y : int = x in + DisEnv.pure (y = 0) +``` +Essentially, the `let@` notation has the function of "unwrapping" a rws computation +and placing it within another rws computation. +`let+` can be used where the final result (at the bottom of the let expression) +is a `DisEnv.pure` computation. +By using `let+`, the `DisEnv.pure` can be omitted. +Other functions exist to manipulate particular rws computations in particular ways +(e.g. a list of rws can become a rws of list). + +The DisEnv struct also contains methods to retrieve variables +(searching the local bindings and then the global bindings), +all returning rws computations. +To emit a statement into the residual program, it provides a `write` function. ### transforms.ml @@ -88,28 +119,38 @@ Some of these are conventional compiler transformations (e.g. removing unused variables, common subexpression factoring, and copy-propagation). The majority, however, are designed to address deficiencies (verbosity or unsupported constructs) in the residual program. -We discuss a few transformations which are of particular interest. - -## StatefulIntToBits / IntToBits - -This transformation converts expressions involving arbitrary-precision integers -into expressions of fixed-length bitvectors -by way of an interval analysis. - -The StatefulIntToBits modules implements this through an abstract interpretation -with an abstract domain of $\text{Width}\times \text{IsSigned} \times \mathbb Z^2$ - - - - - +We list a few transformations which are of particular interest. + +- StatefulIntToBits / IntToBits: + This transformation converts expressions involving arbitrary-precision integers + into expressions of fixed-length bitvectors + by way of an interval analysis. + + The StatefulIntToBits modules implements this through abstract interpretation + with a domain of $\text{Width}\times \text{IsSigned} \times \mathbb Z^2$, + recording bit-width, signedness, and lower and upper bounds. + +- RedundantSlice: Eliminates slice operations which do not reduce the bit-width. + Crucially, this is also responsible for converting slices with non-static indices + into shift/truncate operations. +- bits_coerce_narrow: Pushes slices into sub-expressions where possible. +- CaseSimp: Removes unreachable assertions placed after exhaustive case checks. +- RemoveRegisters: Replaces expressions of ASL's register type with ordinary bit-vectors. +- CopyProp +- CommonSubExprElim +- RemoveUnused ## Grammar -## Supporting +The ASL grammar is defined in [asl.ott](/libASL/asl.ott). +The ott program translates this into a Menhir grammar +which then generates a parser and an associated Ocaml AST. +The generated code modules include Asl_ast, Asl_parser, and Asl_parser_pp. + -- asl_ast (auto-generated) -- monads -- asl\_utils -- utils +For printing AST nodes and other structures, functions beginning with `pp_` are available +in [asl_utils.ml](/libASL/asl_utils.ml) and [utils.ml](/libASL/utils.ml). +These will produce an indented human-readable string. +To emit AST nodes in the structured "aslt" format, +use `Utils.to_string (Asl_parser_pp.pp_raw_stmt stmt)` or similar. diff --git a/docs/vectoriser.md b/docs/vectoriser.md index e69de29b..11af4184 100644 --- a/docs/vectoriser.md +++ b/docs/vectoriser.md @@ -0,0 +1,3 @@ +# vectoriser + +todo From b77c2334245965288081a72ef4c2c403334a4a9a Mon Sep 17 00:00:00 2001 From: rina Date: Thu, 10 Oct 2024 17:10:26 +1000 Subject: [PATCH 5/5] offline.md --- docs/offline.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++++ docs/online.md | 23 ++++++------- 2 files changed, 97 insertions(+), 11 deletions(-) diff --git a/docs/offline.md b/docs/offline.md index e4f60caa..00da5f55 100644 --- a/docs/offline.md +++ b/docs/offline.md @@ -1 +1,86 @@ # Offline partial evaluation + +## Motivation + +The offline partial evaluation is motivated by improving the experience +for users seeking to integrate ASLp within other tools. +With the original online method, +only one opcode could be processed at a time, and +each new opcode would require a complete traversal and simplification +of the ASL specification. +This is obviously inefficient +and necessitates a tight coupling between ASLp +and programs looking to use it. + +## Introduction + +Offline partial evaluation aims to improve this +by performing much of the partial evaluation _ahead-of-time_ +instead of once per-instruction. +In the offline method, +partial evaluation operates with only the knowledge +that the opcode is constant, +but without knowledge of the _value_ of the opcode. + +Further, +we desire a separation of the "decoding" phase +(where opcode bits are examined to determine _which_ instruction is represented), +and the "execution" phase (where opcodes are executed and their actions performed). +This delineation is implemented by hand-written lifters, +where the language of the semantics IR is separate from the language used to implement the lifter itself. + +ASL itself has no such separation, but we can compute it with a _binding-time analysis_. +In this analysis, +constants and the instruction opcode are marked as *lift-time*, then the analysis +traverses the AST in this way: +- if an expression uses only lift-time values, it is also marked as *lift-time*, otherwise +- the expression is marked as *run-time*. +Lift-time values are simply emitted into the lifter's language, and +will not be visible within the final semantics. +Run-time-marked values are translated to a `gen_` prefixed function, +indicating that this should be emitted into the semantics and deferred until run-time. +This gives us an AST for a program which takes an opcode and then constructs a residual program +representing the semantics of the instruction. + +In particular, this representation +within an AST +enables efficient translation of the lifter to arbitrary lifter languages and semantics languages. +This is desirable, entirely eliminating the need to (de)serialise +the semantics and communicate across language boundaries. + +## Overview + +The entry-point for the offline transformation process is the `run` function, +near the end of [symbolic_lifter.ml](/libASL/symbolic_lifter.ml). +Here, the process is divided into stages: +- **Stage 1** - mock decoder & instruction encoding definitions: + Converts the ASL decoder tree into separate functions making up a proper ASL program. + Makes use of [decoder_program.ml](/libASL/decoder_program.ml). +- **Stage 2** - call graph construction: + Identifies reachable functions, using [call_graph.ml](/libASL/call_graph.ml). +- **Stage 3** - simplification: + Minor cleanup of temporary dynamic-length bit-vectors and unsupported structures. +- **Stage 4** - specialisation: + Performs requirement analysis to ensure bit-vector lengths + and loop iterations are statically known. + Inserts splits to branch on conditions determining these quantities. [req_analysis.ml](/libASL/req_analysis.ml). +- **Stage 5** - disassembly: + Invokes the online partial evaluation to reduce and inline structures, then + simplifies with BDDs (an opcode-sensitive value analysis to, e.g., eliminate always-true assertions), +- **Stage 6** - cleanup: + Remove unused variables within each instruction function. +- **Stage 7** - offline transform: + Performs binding-time analysis and transformations, + then BDD-based copy-propagation transform and one final cleanup. [offline_opt.ml](/libASL/offline_opt.ml). + +After the lifter is generated, it is passed to a "backend" +which performs the (mostly syntactic) translation +to a targeted lifter language. + +## Backends + +TODO: mechanism of a backend and how to implement one. + +- given a set of instruction functions +- instruction-building interface +- required: mutable variables, bitvector, bool, and int representations diff --git a/docs/online.md b/docs/online.md index fc49d436..e03c401d 100644 --- a/docs/online.md +++ b/docs/online.md @@ -9,23 +9,24 @@ It does this "online" by recording, for each variable, whether that variable is a concrete, symbolic, or unknown value (in order of simplification potential). -Concrete values are simple literals which can almost always be immediately evaluated -in expressions. -Pure symbolic expressions are pure transformations of a combination -of concrete and symbolic/unknown values. -For example, this includes "(x + 1) - 1" and this can be used to simplify -using algebraic properties of integers and bitvectors. -The last type, unknown, is any computation which is opaque at partial-evaluation time. -Most often, this is register and memory accesses. -When used in an expression, this will emit code to perform the computation and store -it into a read-only variable, transforming it into a pure symbolic expression. + +- Concrete values are simple literals which can almost always be immediately evaluated + in expressions. +- Pure symbolic expressions are pure transformations of a combination + of concrete and symbolic/unknown values. + For example, this includes "(x + 1) - 1" and this can be used to simplify + using algebraic properties of integers and bitvectors. +- The last type, unknown, is any computation which is opaque at partial-evaluation time. + Most often, this is register and memory accesses. + When encountered in an expression, this will emit code to perform the computation and store + it into a read-only variable, transforming it into a pure symbolic expression. ## Implementation Within ASLp, the partial evaluation features are implemented by three files with different responsibilities. These are: symbolic.ml, dis.ml, and transforms.ml. -This is also the order from lower-level concepts (expressions) +This corresponds to the order from lower-level concepts (expressions) to higher-level ones (statements and programs). ### symbolic.ml