Skip to content

Add array built-in functions to the spec #7054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Unisay
Copy link
Contributor

@Unisay Unisay commented Apr 22, 2025

No description provided.

@Unisay Unisay self-assigned this Apr 22, 2025
@Unisay Unisay requested a review from effectfully April 22, 2025 15:25
Comment on lines +77 to +111
& $[] \mapsto []$
\newline
$[x_1] \mapsto [x_1]$
\newline
$[x_1,\ldots,x_n] \mapsto [x_1,\ldots,x_n]$
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a better denotation. Maybe we should use a different syntax to denotate arrays? @effectfully please suggest, what would you do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the whole table looks when rendered:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, although I think you could miss out the two [x₁] ↦ ... cases since they're covered by [x₁, ..., xₙ] with n=1 (and arguably the [] case is covered by n=0, but that might be a bit confusing).

Also,

[x₁, ..., xₙ] ↦ [x₁, ..., xₙ]

might look a bit weird,but on page 33 we've already got (x,y) ↦ (x,y) for the denotation of mkPairData, so its maybe not such a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could miss out the two [x₁] ↦ ... cases since they're covered by [x₁, ..., xₙ] with n=1 (and arguably the [] case is covered by n=0, but that might be a bit confusing).

Yeah, I wasn't sure if [x₁, ..., xₙ] ↦ [x₁, ..., xₙ] subsumes [x₁] ↦ [x₁] or even [] ↦ [] so I went for more explicit notation. Happy to simplify it if you think that there is an ambiguity.

Copy link
Contributor

@kwxm kwxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done for getting to grips with the TeX! This looks pretty good, although I think the flat encoding may be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be a paragraph at the start of the section saying that we've added a new type array(t) and saying what its denotation is: see Tables 4.7 and 4.8 on page 29. The question is what should ⟦array(t)⟧ be? Making it ⟦t⟧*, the set of all tuples of elements of the denotation of t is perfectly sensible even though that's the same as the denotation of list(t), since the only observable difference is in the operations we provide and their complexity. We could introduce some extra notation to distinguish tuples representing arrays from those representing lists, but I don't know if that's really necessary. Maybe an explanatory sentence when introducing the array type operator would be just as good.

Copy link
Contributor Author

@Unisay Unisay Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Batch 6 sub-sub-section for the array type operator and its concrete syntax.
image

@@ -528,7 +531,8 @@ \subsection{Constants}
& \Econstant{\ty{bool}}(s, \texttt{False}) &&= s \cdot \bits{0}\\
& \Econstant{\ty{bool}}(s, \texttt{True}) &&= s \cdot \bits{1}\\
& \Econstant{\listOf{\tn}}(s,l) &&= \Elist^{\tn}_{\mathsf{constant}}(s, l) \\
& \Econstant{\pairOf{\tn_1}{\tn_2}}(s,(c_1,c_2)) &&= \Econstant{\tn_2}(\Econstant{\tn_1}(s, c_1), c_2)\\
& \Econstant{\arrayOf{\tn}}(s,l) &&= \Earray^{\tn}_{\mathsf{constant}}(s, l) \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that flat encodes arrays in a different way from lists. I have a program here that'll take a flat file and decode it, describing what the bits of the encoding mean. I wrote that to help me understand the flat encoding so that I could write a proper specifcation of the parts that we use. I don't quite remember why, but I added the array operations a few weeks ago and had to write some code to decode arrays. After a bit of trial and error, it seemed that arrays are encoded by emitting a natural number that says how many elements it has, then emitting the encodings of the individual elements, and then for some reason emitting 8 zero bits afterwards. I didn't actually check the implementation though, so maybe you should have a look and see if that's correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look but I couldn''t work out how the flat encoding for arrays is determined. I checked again and arrays are definitely encoded differently from lists.

Copy link
Contributor Author

@Unisay Unisay Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The haddock for the Vector instance says

Vectors are encoded as arrays.

but that seems to be very misleading. The Haskell Array type is more like some kind of association list and you can see that in the examples for the Array instance: it's encoded as a sequence of (index,element) pairs. That's definitely not what it's doing with UPLC arrays. It seems to be doing something much more like what you'd do with standard arrays. In the source for the flat library it has

instance Flat a => Flat (V.Vector a) where
  size   = sizeSequence
  encode = encodeSequence
  decode = decodeSequence

and in the the file that defines `encodeSequence it says

In practice, this means that the sequence is encoded as a sequence of blocks of up to 255 elements, with every block preceded by the count of the elements in the block and a final 0-length block.

This looks much more like what I observed, and my flat debugger fails when I give it an array with more than 255 elements because I assumed that there was only one block. I think (but haven't checked in detail) that it's encoding an array as a sequence of blocks, each one consisting of a size (eight bits long?) followed by that many elements, and then the 00000000 at the end is an empty block that tells you thhat you've finished. Presumably that means that you can serialise an array in many different ways depending on how you break it into blocks. I guess we''ll have work out the details and put it all into the specification. I think something similar happens in CBOR and that's described in our specification, so maybe we could steal that, or at least a modified version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, my experiments show that Strict.Vectors are encoded as sequences:

#!/usr/bin/env runhaskell

import Data.Bits (testBit)
import Data.ByteString (ByteString, unpack)
import Data.Vector.Strict (Vector)
import Data.Vector.Strict qualified as StrictVector
import Data.Word (Word8)
import Flat (DecodeException, Flat (..), flat, unflat)
import Flat.Instances.Test (tstBits)
import Flat.Instances.Vector ()

main :: IO ()
main = do
  let
    words :: Vector Word8
    words = StrictVector.fromList [1, 2, 3, 4, 5]

    bytes :: ByteString
    bytes = flat words

  print $ tstBits words
  -- (True,56,"00000101 00000001 00000010 00000011 00000100 00000101 00000000")

  putStrLn $ renderBits bytes
  -- 00000101 00000001 00000010 00000011 00000100 00000101 00000000 00000001

  let
    words' :: Either DecodeException (Vector Word8)
    words' = unflat bytes

  print words' -- Right [1,2,3,4,5]

renderBits :: ByteString -> String
renderBits bs = unwords [byteToBits w | w <- unpack bs]
 where
  byteToBits :: Word8 -> String
  byteToBits byte = [if testBit byte i then '1' else '0' | i <- [7, 6 .. 0]]

-- from module Data.Vector.Orphans () in the Plutus repo
instance (Flat a) => Flat (Vector a) where
  size = size . StrictVector.toLazy -- Strict to Lazy is O(1)
  encode = encode . StrictVector.toLazy
  decode = StrictVector.fromLazy <$> decode -- Strict from Lazy is O(1)

I interpret the encoded bits as:

00000101: size = 5 
00000001: 1
00000010: 2
00000011: 3
00000100: 4
00000101: 5
00000000: end of the vector
00000001: padding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@Unisay Unisay force-pushed the yura/plc-spec-builtin-array branch from a52b9f8 to b9499e4 Compare April 23, 2025 14:54
@Unisay Unisay added the No Changelog Required Add this to skip the Changelog Check label Apr 23, 2025
@Unisay Unisay requested a review from kwxm April 25, 2025 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Changelog Required Add this to skip the Changelog Check
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants