Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literal "byte arrays" and "true arrays of integers" #371

Closed
GreyCat opened this issue Mar 8, 2018 · 10 comments
Closed

Literal "byte arrays" and "true arrays of integers" #371

GreyCat opened this issue Mar 8, 2018 · 10 comments
Assignees
Milestone

Comments

@GreyCat
Copy link
Member

GreyCat commented Mar 8, 2018

Currently, we have single syntax to specify both "byte arrays" and "true arrays". Distinction is done by compiler with some heuristics:

  • [1, 2, 3] would be byte array (because all elements are integer literals 0..255)
  • [1, 2, 3, 500] would be true array (because 500 > 255)
  • [0 + 1, 2, 3] would be true array (because 0 + 1 is not a literal)

I propose to get rid of this heuristics and make an uniform & easy-to-understand way to specify both.

This is (kind of) prerequisite for #253. Technically we can (and will) use hacky trick with 0 + 1, but it's much better to have clear specs.

@arekbulski
Copy link
Member

How about (u4)[1,2,3] as example of integer array? [1,2,3] would implicitly be byte-array.

@KOLANICH
Copy link

KOLANICH commented Mar 9, 2018

I guess we don't need any heuristics, the type is determined by context.

In content and terminator it is meant to be a byte array.
I don't know other contexts when typing is needed.

If we need content and teminator with bytes of a some struct filled with byte representation of a struct, for example an array of bytes, we need serialization implemented first, and then serializing syntax. I mean if we need such a content, we first create a structure describing the structure of a content, for example an array of integers, then we need some syntax for inline serialization, we use that syntax and get the padding.

Something like

seq:
  - id: a
    type: int_s(_index)
    repeat: eos
types:
  s:
    seq:
      - id: a
        type: u4
      - id: b
        type: f4
  int_s:
    params:
      - id: n
        type: u4
    seq:
      - id: a
        content:
          #only a single child from a restricted set of keys is allowed, if it is an object
          serialize: # in this case there is no need to really serialize, you can parse and compare, the compiler should catch this and generate the appropriate code.
            #only a single child is allowed, the key  is the name of type to serialize, the value is an object with names of properties as tkeys and values to assign as values
            s:
              a: n
              b: n.to_f4

@arekbulski
Copy link
Member

In Construct, Const value is not necessarily bytes type. Something like Const(0, Int32ub).

@GreyCat You asked for example of usecase for thin-seq, what if the user wanted something like
Const([1,2,3], Array(3, Byte)). The content-serialize would refer to a user defined type, but those can only be structs. Consider alternative syntax:

- id: a
  content: [1,2,3]
  type: u1
  repeat: expr
  repeat-expr: 3

@koczkatamas
Copy link
Member

koczkatamas commented Mar 9, 2018

I'd prefer a solution which still uses YAML schema. So while [1,2,3] is a valid YAML array and parsed out-of-box by YAML parsers, (u4)[1,2,3] will be parsed as string, which is a drawback IMHO.

So I'd introduce a new property which contains the type (eg. in the form <original-property-name>-type, so it would be content-type for content).

Another option maybe using YAML tags, but to be frank I am not sure that it's easy to add custom types to various YAML parsers, and it would give the schema users a less out-of-box experience, so I'd still prefer the <prop>-type style solution from the two.

It's a harder question for expressions, but I presume we could use the current cast operator? Or introduce something new?

@arekbulski
Copy link
Member

I agree. Did you mean content-type as the type of element of an array? Like in this example:

- id: a
  content: [1,2,3]
  content-type: u4

@GreyCat
Copy link
Member Author

GreyCat commented Mar 9, 2018

@KOLANICH @koczkatamas Guys, this question is about literals, and I'm talking about expression language. So, there is no YAML, no KSY tags, nothing, just a string. For practical purposes, you can think of it as a string that we embed into

instances:
  foo:
    value: '[1, 2, 3]' # <= here

@arekbulski

How about (u4)[1,2,3] as example of integer array? [1,2,3] would implicitly be byte-array.

Yeah, good idea. We do casting as postfix operator .as<foo>, though, so probably we could do something like:

  • [1, 2, 3] stays the same (and maintains compatibility)
  • [0 + 1, 2, 3].as<bytes> is byte array
  • [1, 2, 3].as<u1[]> for true array of u1
  • [1, 2, 3].as<u4[]> for true array of u4
  • [1, 2, 3].as<f8[]> for true array of f8, i.e. [1.0, 2.0, 3.0]

The same operator could be actually used to specify sized literals, if we'll ever need it — i.e. 123.as<u2>, 123.as<u8>, etc. Not much worse than C++'s yet unaffirmed 123Ui16 and 123Ui64.

@kaitai-io kaitai-io deleted a comment from GreyCat Mar 9, 2018
@KOLANICH
Copy link

KOLANICH commented Mar 9, 2018

Guys, this question is about literals, and I'm talking about expression language.

I wonder if we really need such kind of literals. I mean that usually we deal with arrays of structures, not arrays of numbers. Using literal array syntax doesn't suit the use case with structs well. But if we had a syntax with structs there should be no need for a separate syntax with numbers since we use it very seldom.

@GreyCat
Copy link
Member Author

GreyCat commented Mar 9, 2018

I wonder if we really need such literals.

Yes, we do — at least for #253, and it's also widely used for stuff like "construct byte array from expressions, then convert it to string".

@KOLANICH
Copy link

KOLANICH commented Mar 9, 2018

Yes, we do — at least for #253

- actual: docs[0].indicator #the type is defined by the type of this field (if there is no selection of type based on something, but if there is, you can test the type of the switch-on variable or maybe introduce some property to get a type), there is no need to specify it explicitly
  expected: '[0x49, 0x49]'

and it's also widely used for stuff like "construct byte array from expressions, then convert it to string

my proposal fits here (assumming serialization, chains and assertions implemented). Of course it is wery rough and needs some improvement before implementation.

@GreyCat GreyCat self-assigned this Apr 8, 2018
@GreyCat GreyCat added this to the v0.9 milestone Apr 8, 2018
@GreyCat
Copy link
Member Author

GreyCat commented Apr 8, 2018

Seems to be working now, relevant tests added to TranslatorSpec, usable for KST purposes.

@GreyCat GreyCat closed this as completed Apr 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants