-
Notifications
You must be signed in to change notification settings - Fork 485
Add array built-in functions to the spec #7054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
& $[] \mapsto []$ | ||
\newline | ||
$[x_1] \mapsto [x_1]$ | ||
\newline | ||
$[x_1,\ldots,x_n] \mapsto [x_1,\ldots,x_n]$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know a better denotation. Maybe we should use a different syntax to denotate arrays? @effectfully please suggest, what would you do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, although I think you could miss out the two [x₁] ↦ ...
cases since they're covered by [x₁, ..., xₙ]
with n=1 (and arguably the []
case is covered by n=0, but that might be a bit confusing).
Also,
[x₁, ..., xₙ] ↦ [x₁, ..., xₙ]
might look a bit weird,but on page 33 we've already got (x,y) ↦ (x,y)
for the denotation of mkPairData
, so its maybe not such a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could miss out the two [x₁] ↦ ... cases since they're covered by [x₁, ..., xₙ] with n=1 (and arguably the [] case is covered by n=0, but that might be a bit confusing).
Yeah, I wasn't sure if [x₁, ..., xₙ] ↦ [x₁, ..., xₙ]
subsumes [x₁] ↦ [x₁]
or even [] ↦ []
so I went for more explicit notation. Happy to simplify it if you think that there is an ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done for getting to grips with the TeX! This looks pretty good, although I think the flat encoding may be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should probably be a paragraph at the start of the section saying that we've added a new type array(t)
and saying what its denotation is: see Tables 4.7 and 4.8 on page 29. The question is what should ⟦array(t)⟧
be? Making it ⟦t⟧*
, the set of all tuples of elements of the denotation of t
is perfectly sensible even though that's the same as the denotation of list(t)
, since the only observable difference is in the operations we provide and their complexity. We could introduce some extra notation to distinguish tuples representing arrays from those representing lists, but I don't know if that's really necessary. Maybe an explanatory sentence when introducing the array
type operator would be just as good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -528,7 +531,8 @@ \subsection{Constants} | |||
& \Econstant{\ty{bool}}(s, \texttt{False}) &&= s \cdot \bits{0}\\ | |||
& \Econstant{\ty{bool}}(s, \texttt{True}) &&= s \cdot \bits{1}\\ | |||
& \Econstant{\listOf{\tn}}(s,l) &&= \Elist^{\tn}_{\mathsf{constant}}(s, l) \\ | |||
& \Econstant{\pairOf{\tn_1}{\tn_2}}(s,(c_1,c_2)) &&= \Econstant{\tn_2}(\Econstant{\tn_1}(s, c_1), c_2)\\ | |||
& \Econstant{\arrayOf{\tn}}(s,l) &&= \Earray^{\tn}_{\mathsf{constant}}(s, l) \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that flat
encodes arrays in a different way from lists. I have a program here that'll take a flat
file and decode it, describing what the bits of the encoding mean. I wrote that to help me understand the flat encoding so that I could write a proper specifcation of the parts that we use. I don't quite remember why, but I added the array operations a few weeks ago and had to write some code to decode arrays. After a bit of trial and error, it seemed that arrays are encoded by emitting a natural number that says how many elements it has, then emitting the encodings of the individual elements, and then for some reason emitting 8 zero bits afterwards. I didn't actually check the implementation though, so maybe you should have a look and see if that's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look but I couldn''t work out how the flat encoding for arrays is determined. I checked again and arrays are definitely encoded differently from lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is how vectors/arrays are "flat"-coded:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The haddock for the Vector
instance says
Vectors are encoded as arrays.
but that seems to be very misleading. The Haskell Array type is more like some kind of association list and you can see that in the examples for the Array
instance: it's encoded as a sequence of (index,element) pairs. That's definitely not what it's doing with UPLC arrays. It seems to be doing something much more like what you'd do with standard arrays. In the source for the flat
library it has
instance Flat a => Flat (V.Vector a) where
size = sizeSequence
encode = encodeSequence
decode = decodeSequence
and in the the file that defines `encodeSequence it says
In practice, this means that the sequence is encoded as a sequence of blocks of up to 255 elements, with every block preceded by the count of the elements in the block and a final 0-length block.
This looks much more like what I observed, and my flat debugger fails when I give it an array with more than 255 elements because I assumed that there was only one block. I think (but haven't checked in detail) that it's encoding an array as a sequence of blocks, each one consisting of a size (eight bits long?) followed by that many elements, and then the 00000000 at the end is an empty block that tells you thhat you've finished. Presumably that means that you can serialise an array in many different ways depending on how you break it into blocks. I guess we''ll have work out the details and put it all into the specification. I think something similar happens in CBOR and that's described in our specification, so maybe we could steal that, or at least a modified version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, my experiments show that Strict.Vector
s are encoded as sequences:
#!/usr/bin/env runhaskell
import Data.Bits (testBit)
import Data.ByteString (ByteString, unpack)
import Data.Vector.Strict (Vector)
import Data.Vector.Strict qualified as StrictVector
import Data.Word (Word8)
import Flat (DecodeException, Flat (..), flat, unflat)
import Flat.Instances.Test (tstBits)
import Flat.Instances.Vector ()
main :: IO ()
main = do
let
words :: Vector Word8
words = StrictVector.fromList [1, 2, 3, 4, 5]
bytes :: ByteString
bytes = flat words
print $ tstBits words
-- (True,56,"00000101 00000001 00000010 00000011 00000100 00000101 00000000")
putStrLn $ renderBits bytes
-- 00000101 00000001 00000010 00000011 00000100 00000101 00000000 00000001
let
words' :: Either DecodeException (Vector Word8)
words' = unflat bytes
print words' -- Right [1,2,3,4,5]
renderBits :: ByteString -> String
renderBits bs = unwords [byteToBits w | w <- unpack bs]
where
byteToBits :: Word8 -> String
byteToBits byte = [if testBit byte i then '1' else '0' | i <- [7, 6 .. 0]]
-- from module Data.Vector.Orphans () in the Plutus repo
instance (Flat a) => Flat (Vector a) where
size = size . StrictVector.toLazy -- Strict to Lazy is O(1)
encode = encode . StrictVector.toLazy
decode = StrictVector.fromLazy <$> decode -- Strict from Lazy is O(1)
I interpret the encoded bits as:
00000101: size = 5
00000001: 1
00000010: 2
00000011: 3
00000100: 4
00000101: 5
00000000: end of the vector
00000001: padding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a52b9f8
to
b9499e4
Compare
No description provided.