-
Notifications
You must be signed in to change notification settings - Fork 485
Add array built-in functions to the spec #7054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,9 +11,39 @@ | |
\subsection{Batch 6} | ||
\label{sec:default-builtins-6} | ||
|
||
\subsubsection{Built-in type operators} | ||
\label{sec:built-in-type-operatos-6} | ||
The sixth batch adds type operators defined in Table~\ref{table:built-in-type-operators-6}. | ||
|
||
\begin{table}[H] | ||
\centering | ||
\begin{tabular}{|l|p{14mm}|l|l|} | ||
\hline | ||
Operator $\op$ & $\left|\op\right|$ & Denotation & Concrete Syntax\\ | ||
\hline | ||
\texttt{array} | ||
& 1 | ||
& $\denote{\arrayOf{t}} = \denote{t}^*$ | ||
& See below\\ | ||
\hline | ||
\end{tabular} | ||
\caption{Type operators, Batch 6} | ||
\label{table:built-in-type-operators-6} | ||
\end{table} | ||
|
||
\paragraph{Concrete syntax for arrays.} | ||
An array of type $\texttt{array}(t)$ is written as a syntactic list | ||
\texttt{[$c_1, \ldots, c_n$]} where each $c_i$ lies in $\bitc_t$. | ||
Some valid constant expressions are thus: | ||
\begin{verbatim} | ||
(con (array bool) [True, False, True]) | ||
(con (array integer) []) | ||
\end{verbatim} | ||
|
||
\subsubsection{Built-in functions} | ||
\label{sec:built-in-functions-6} | ||
The sixth batch of builtin operations is defined in Table~\ref{table:built-in-functions-6}. | ||
The sixth batch of built-in types is defined in Table~\ref{table:built-in-types-6}. | ||
Operations are defined in Table~\ref{table:built-in-functions-6}. | ||
|
||
\setlength{\LTleft}{-12mm} % Shift the table left a bit to centre it on the page | ||
\renewcommand*{\arraystretch}{1.25} %% Stretch the space between the rows by a factor of 25% | ||
|
@@ -63,6 +93,36 @@ \subsubsection{Built-in functions} | |
[x_{k+1}, \ldots, x_n] & \text{if $1 \leq k \leq n-1$} \\ \relax | ||
[] &\text{if $k \geq n$}\\ | ||
\end{array}\right.$} & & \\ | ||
\TT{lengthOfArray} | ||
& $[\forall a_\#, \arrayOf{a_\#}] \to \ty{integer}$ | ||
& $[] \mapsto 0$ | ||
\newline | ||
$[x_1] \mapsto 1$ | ||
\newline | ||
$[x_1,\ldots,x_n] \mapsto n$ | ||
& | ||
& \\ | ||
\TT{listToArray} | ||
& $[\forall a_\#, \listOf{a_\#}] \to \arrayOf{a_\#}$ | ||
& $[] \mapsto []$ | ||
\newline | ||
$[x_1] \mapsto [x_1]$ | ||
\newline | ||
$[x_1,\ldots,x_n] \mapsto [x_1,\ldots,x_n]$ | ||
Comment on lines
+107
to
+111
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know a better denotation. Maybe we should use a different syntax to denotate arrays? @effectfully please suggest, what would you do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks good, although I think you could miss out the two Also,
might look a bit weird,but on page 33 we've already got There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't sure if |
||
& | ||
& \\ | ||
\TT{indexArray} | ||
& $[\forall a_\#, \arrayOf{a_\#}, \ty{integer}]$ \text{$\;\;\; \to \ty{a_\#}$} | ||
& $([x_1,\ldots,x_n], k)$ | ||
\smallskip | ||
\newline | ||
\text{$\;\;\mapsto \left\{ \begin{array}{ll} | ||
\errorX & \text{if $k < 0$} \\ \relax % | ||
x_{k+1} & \text{if $0 \leq k \leq n-1$} \\ \relax | ||
\errorX & \text{if $k > n-1$}\\ | ||
\end{array}\right.$} | ||
& Yes | ||
& \\ | ||
\hline | ||
\end{longtable} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ \chapter{Serialising Plutus Core Terms and Programs Using the \texttt{flat} Form | |
|
||
|
||
\section{Encoding and decoding} | ||
\label{sec:encoding-and-decoding} | ||
Firstly recall some notation from Section~\ref{sec:notation}. The set of all | ||
finite sequences of bits is denoted by $\Bits = \{\bits{0},\bits{1}\}^*$. For | ||
brevity we write a sequence of bits in the form $b_{n-1} \cdots b_0$ instead of | ||
|
@@ -79,7 +80,7 @@ \section{Encoding and decoding} | |
\subsection{Padding} | ||
The encoding functions mentioned above produce sequences of \textit{bits}, but | ||
we sometimes need sequences of \textit{bytes}. To this end we introduce a | ||
functions $\pad: \Bits \rightarrow \Bits$ which adds a sequence of $\bits{0}$s | ||
function $\pad: \Bits \rightarrow \Bits$ which adds a sequence of $\bits{0}$s | ||
followed by a $\bits{1}$ to a sequence $s$ to get a sequence whose length is a | ||
multiple of 8; if $s$ is a sequence such that $\length(s)$ is already a multiple of 8 | ||
then $\pad$ still adds an extra byte of padding; $\pad$ is used both for | ||
|
@@ -155,6 +156,44 @@ \subsection{Lists} | |
\Dlist_X(\bits{1} \cdot s) &= (s'', x \cdot l) \quad \text{if $D_X(s) = (s', x)$ and $\Dlist_X(s') = (s'', l).$} | ||
\end{align*} | ||
|
||
\subsection{Arrays} | ||
Arrays are encoded as a sequence of blocks of up to 255 elements. | ||
Each block is preceded by a single unsigned byte containing the count of elements in the block, | ||
and the end of the encoding is marked by a zero-length block. | ||
|
||
Let $X$ be the set of elements in the array for which we have defined an encoder $\E_X$ and a | ||
decoder $\D_X$. We define an encoder $\Earray_X$ which encodes arrays of elements of $X$: | ||
|
||
$$ | ||
\Earray_X(s, [x_1, \ldots, x_n]) = | ||
\begin{cases} | ||
s \cdot \E_8(0) & \text{if } n = 0 \\ | ||
\text{let } k = \min(n, 255) \text{ in } \\ | ||
\Earray_X(s \cdot \E_8(k) \cdot \E_X(x_1) \cdots \E_X(x_k), [x_{k+1}, \ldots, x_n]) | ||
& \text{if } n > 0 | ||
\end{cases} | ||
$$ | ||
|
||
The corresponding decoder is given by: | ||
|
||
$$ | ||
\Darray_X(s) = | ||
\begin{cases} | ||
(s', []) & \text{if } \D_8(s) = (s', 0) \\ | ||
\text{let } (s', k) = \D_8(s) \text{ in } \\ | ||
\text{let } (s'', x_1) = \D_X(s') \text{ and } \ldots \text{ and } (s^{(k)}, x_k) = \D_X(s^{(k-1)}) \text{ in } \\ | ||
\text{let } (s''', l) = \Darray_X(s^{(k)}) \text{ in } \\ | ||
(s''', [x_1, \ldots, x_k] \cdot l) & \text{if } 1 \leq k \leq 255 | ||
\end{cases} | ||
$$ | ||
|
||
The decoder reads the initial byte to determine the length of the subsequent block. | ||
If the length $k$ is greater than 0 and no more bytes are available to decode $k$ | ||
elements of type $X$, the decoder would fail as per the general behaviour of decoders | ||
described in Section~\ref{sec:encoding-and-decoding}. | ||
The concatenation of the decoded block $[x_1, \ldots, x_k]$ with the recursively | ||
decoded list $l$ (from the remainder $s^{(k)}$) forms the final result. | ||
|
||
\subsection{Natural numbers} | ||
We encode natural numbers by splitting their binary representations into | ||
sequences of 7-bit blocks, then emitting these as a list with the \textbf{least | ||
|
@@ -443,6 +482,7 @@ \subsection{Built-in types} | |
$\ty{bls12\_381\_G1\_element}$ & $\bits{1001}$ & 9 \\ | ||
$\ty{bls12\_381\_G2\_element}$ & $\bits{1010}$ & 10 \\ | ||
$\ty{bls12\_381\_MlResult}$ & $\bits{1011}$ & 11 \\ | ||
$\ty{array}$ & $\bits{1100}$ & 12 \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Type tags} | ||
|
@@ -465,6 +505,7 @@ \subsection{Built-in types} | |
&\etype(\ty{unit}) &&= [3] \\ | ||
&\etype(\ty{bool}) &&= [4] \\ | ||
&\etype(\listOf{t}) &&= [7,5] \cdot \etype(t) \\ | ||
&\etype(\arrayOf{t}) &&= [7,12] \cdot \etype(t) \\ | ||
&\etype(\pairOf{t_1}{t_2}) &&= [7,7,6] \cdot \etype(t_1) \cdot \etype(t_2)\\ | ||
&\etype(\ty{data}) &&= [8]. | ||
\end{alignat*} | ||
|
@@ -476,6 +517,7 @@ \subsection{Built-in types} | |
&\dtype(3 \cdot l) &&= (l, \ty{unit}) \\ | ||
&\dtype(4 \cdot l) &&= (l, \ty{bool}) \\ | ||
&\dtype([7,5] \cdot l) &&= (l', \listOf{t}) &&\quad \text{if $\dtype(l) = (l', t)$}\\ | ||
&\dtype([7,12] \cdot l) &&= (l', \arrayOf{t}) &&\quad \text{if $\dtype(l) = (l', t)$}\\ | ||
&\dtype([7,7,6] \cdot l) &&= (l'', \pairOf{t_1}{t_2}) | ||
&&\ \begin{cases} | ||
\text{if} & \dtype(l) = (l', t_1)\\ | ||
|
@@ -528,7 +570,8 @@ \subsection{Constants} | |
& \Econstant{\ty{bool}}(s, \texttt{False}) &&= s \cdot \bits{0}\\ | ||
& \Econstant{\ty{bool}}(s, \texttt{True}) &&= s \cdot \bits{1}\\ | ||
& \Econstant{\listOf{\tn}}(s,l) &&= \Elist^{\tn}_{\mathsf{constant}}(s, l) \\ | ||
& \Econstant{\pairOf{\tn_1}{\tn_2}}(s,(c_1,c_2)) &&= \Econstant{\tn_2}(\Econstant{\tn_1}(s, c_1), c_2)\\ | ||
& \Econstant{\arrayOf{\tn}}(s,l) &&= \Earray^{\tn}_{\mathsf{constant}}(s, l) \\ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had a look but I couldn''t work out how the flat encoding for arrays is determined. I checked again and arrays are definitely encoded differently from lists. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, this is how vectors/arrays are "flat"-coded: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The haddock for the
but that seems to be very misleading. The Haskell Array type is more like some kind of association list and you can see that in the examples for the
and in the the file that defines `encodeSequence it says
This looks much more like what I observed, and my flat debugger fails when I give it an array with more than 255 elements because I assumed that there was only one block. I think (but haven't checked in detail) that it's encoding an array as a sequence of blocks, each one consisting of a size (eight bits long?) followed by that many elements, and then the 00000000 at the end is an empty block that tells you thhat you've finished. Presumably that means that you can serialise an array in many different ways depending on how you break it into blocks. I guess we''ll have work out the details and put it all into the specification. I think something similar happens in CBOR and that's described in our specification, so maybe we could steal that, or at least a modified version. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right, my experiments show that #!/usr/bin/env runhaskell
import Data.Bits (testBit)
import Data.ByteString (ByteString, unpack)
import Data.Vector.Strict (Vector)
import Data.Vector.Strict qualified as StrictVector
import Data.Word (Word8)
import Flat (DecodeException, Flat (..), flat, unflat)
import Flat.Instances.Test (tstBits)
import Flat.Instances.Vector ()
main :: IO ()
main = do
let
words :: Vector Word8
words = StrictVector.fromList [1, 2, 3, 4, 5]
bytes :: ByteString
bytes = flat words
print $ tstBits words
-- (True,56,"00000101 00000001 00000010 00000011 00000100 00000101 00000000")
putStrLn $ renderBits bytes
-- 00000101 00000001 00000010 00000011 00000100 00000101 00000000 00000001
let
words' :: Either DecodeException (Vector Word8)
words' = unflat bytes
print words' -- Right [1,2,3,4,5]
renderBits :: ByteString -> String
renderBits bs = unwords [byteToBits w | w <- unpack bs]
where
byteToBits :: Word8 -> String
byteToBits byte = [if testBit byte i then '1' else '0' | i <- [7, 6 .. 0]]
-- from module Data.Vector.Orphans () in the Plutus repo
instance (Flat a) => Flat (Vector a) where
size = size . StrictVector.toLazy -- Strict to Lazy is O(1)
encode = encode . StrictVector.toLazy
decode = StrictVector.fromLazy <$> decode -- Strict from Lazy is O(1) I interpret the encoded bits as:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Unisay Maybe the binary serialisation/deserialisation section of https://cips.cardano.org/cip/CIP-0138 helps. It describes how the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if it turns out that what I wrote in the CIP is wrong, we should amend it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It says "The decoding of arrays is, of course, the inverse of the encoding function". I think that may not be exactly correct. For encoding it specifies that you should produce a bunch of initial 255-byte blocks possibly followed by a smaller final one. I suspect (but haven't checked) that the decoder will accept a sequence of blocks of random nonzero lengths (up to 255 bytes). There's a similar thing for bytestrings in flat (see p62 of the specification) and certain items in |
||
& \Econstant{\pairOf{\tn_1}{\tn_2}}(s,(c_1,c_2)) &&= \Econstant{\tn_2}(\Econstant{\tn_1}(s, c_1), c_2)\\ | ||
& \Econstant{\ty{data}}(s,d) &&= \E_{\B^*}(s, \eData(d)) | ||
\end{alignat*} | ||
|
||
|
@@ -540,6 +583,7 @@ \subsection{Constants} | |
&\dConstant{\ty{bool}}(\bits{0} \cdot s) &&= (s, \texttt{False}) \\ | ||
&\dConstant{\ty{bool}}(\bits{1} \cdot s) &&= (s, \texttt{True}) \\ | ||
&\dConstant{\listOf{\tn}}(s) &&= \Dlist^{\tn}_{\mathsf{constant}}(s, l) \\ | ||
&\dConstant{\arrayOf{\tn}}(s) &&= \Darray^{\tn}_{\mathsf{constant}}(s, l) \\ | ||
&\dConstant{\pairOf{\tn_1}{\tn_2}}(s) &&= (s'', (c_1, c_2)) | ||
&& \begin{cases} | ||
\text{if} & \dConstant{\tn_1}(s) = (s', c_1) \\ | ||
|
@@ -699,6 +743,9 @@ \subsection{Built-in functions} | |
\TT{caseList} & $\bits{1011000}$ & 88 \\ | ||
\TT{caseData} & $\bits{1011001}$ & 89 \\ | ||
\TT{dropList} & $\bits{1011010}$ & 90 \\ | ||
\TT{lengthOfArray} & $\bits{1011011}$ & 91 \\ | ||
\TT{listToArray} & $\bits{1011100}$ & 92 \\ | ||
\TT{indexArray} & $\bits{1011101}$ & 93 \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Tags for built-in functions (Batch 6)} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should probably be a paragraph at the start of the section saying that we've added a new type
array(t)
and saying what its denotation is: see Tables 4.7 and 4.8 on page 29. The question is what should⟦array(t)⟧
be? Making it⟦t⟧*
, the set of all tuples of elements of the denotation oft
is perfectly sensible even though that's the same as the denotation oflist(t)
, since the only observable difference is in the operations we provide and their complexity. We could introduce some extra notation to distinguish tuples representing arrays from those representing lists, but I don't know if that's really necessary. Maybe an explanatory sentence when introducing thearray
type operator would be just as good.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a Batch 6 sub-sub-section for the array type operator and its concrete syntax.
