Literal "byte arrays" and "true arrays of integers" #371

GreyCat · 2018-03-08T23:48:31Z

Currently, we have single syntax to specify both "byte arrays" and "true arrays". Distinction is done by compiler with some heuristics:

[1, 2, 3] would be byte array (because all elements are integer literals 0..255)
[1, 2, 3, 500] would be true array (because 500 > 255)
[0 + 1, 2, 3] would be true array (because 0 + 1 is not a literal)

I propose to get rid of this heuristics and make an uniform & easy-to-understand way to specify both.

This is (kind of) prerequisite for #253. Technically we can (and will) use hacky trick with 0 + 1, but it's much better to have clear specs.

The text was updated successfully, but these errors were encountered:

arekbulski · 2018-03-09T06:37:40Z

How about (u4)[1,2,3] as example of integer array? [1,2,3] would implicitly be byte-array.

KOLANICH · 2018-03-09T07:45:25Z

I guess we don't need any heuristics, the type is determined by context.

In content and terminator it is meant to be a byte array.
I don't know other contexts when typing is needed.

If we need content and teminator with bytes of a some struct filled with byte representation of a struct, for example an array of bytes, we need serialization implemented first, and then serializing syntax. I mean if we need such a content, we first create a structure describing the structure of a content, for example an array of integers, then we need some syntax for inline serialization, we use that syntax and get the padding.

Something like

seq:
  - id: a
    type: int_s(_index)
    repeat: eos
types:
  s:
    seq:
      - id: a
        type: u4
      - id: b
        type: f4
  int_s:
    params:
      - id: n
        type: u4
    seq:
      - id: a
        content:
          #only a single child from a restricted set of keys is allowed, if it is an object
          serialize: # in this case there is no need to really serialize, you can parse and compare, the compiler should catch this and generate the appropriate code.
            #only a single child is allowed, the key  is the name of type to serialize, the value is an object with names of properties as tkeys and values to assign as values
            s:
              a: n
              b: n.to_f4

arekbulski · 2018-03-09T08:00:12Z

In Construct, Const value is not necessarily bytes type. Something like Const(0, Int32ub).

@GreyCat You asked for example of usecase for thin-seq, what if the user wanted something like
Const([1,2,3], Array(3, Byte)). The content-serialize would refer to a user defined type, but those can only be structs. Consider alternative syntax:

- id: a
  content: [1,2,3]
  type: u1
  repeat: expr
  repeat-expr: 3

koczkatamas · 2018-03-09T08:23:39Z

I'd prefer a solution which still uses YAML schema. So while [1,2,3] is a valid YAML array and parsed out-of-box by YAML parsers, (u4)[1,2,3] will be parsed as string, which is a drawback IMHO.

So I'd introduce a new property which contains the type (eg. in the form <original-property-name>-type, so it would be content-type for content).

Another option maybe using YAML tags, but to be frank I am not sure that it's easy to add custom types to various YAML parsers, and it would give the schema users a less out-of-box experience, so I'd still prefer the <prop>-type style solution from the two.

It's a harder question for expressions, but I presume we could use the current cast operator? Or introduce something new?

arekbulski · 2018-03-09T09:41:12Z

I agree. Did you mean content-type as the type of element of an array? Like in this example:

- id: a
  content: [1,2,3]
  content-type: u4

GreyCat · 2018-03-09T10:11:43Z

@KOLANICH @koczkatamas Guys, this question is about literals, and I'm talking about expression language. So, there is no YAML, no KSY tags, nothing, just a string. For practical purposes, you can think of it as a string that we embed into

instances:
  foo:
    value: '[1, 2, 3]' # <= here

@arekbulski

How about (u4)[1,2,3] as example of integer array? [1,2,3] would implicitly be byte-array.

Yeah, good idea. We do casting as postfix operator .as<foo>, though, so probably we could do something like:

[1, 2, 3] stays the same (and maintains compatibility)
[0 + 1, 2, 3].as<bytes> is byte array
[1, 2, 3].as<u1[]> for true array of u1
[1, 2, 3].as<u4[]> for true array of u4
[1, 2, 3].as<f8[]> for true array of f8, i.e. [1.0, 2.0, 3.0]

The same operator could be actually used to specify sized literals, if we'll ever need it — i.e. 123.as<u2>, 123.as<u8>, etc. Not much worse than C++'s yet unaffirmed 123Ui16 and 123Ui64.

KOLANICH · 2018-03-09T13:29:47Z

Guys, this question is about literals, and I'm talking about expression language.

I wonder if we really need such kind of literals. I mean that usually we deal with arrays of structures, not arrays of numbers. Using literal array syntax doesn't suit the use case with structs well. But if we had a syntax with structs there should be no need for a separate syntax with numbers since we use it very seldom.

GreyCat · 2018-03-09T13:59:15Z

I wonder if we really need such literals.

Yes, we do — at least for #253, and it's also widely used for stuff like "construct byte array from expressions, then convert it to string".

KOLANICH · 2018-03-09T14:24:42Z

Yes, we do — at least for #253

- actual: docs[0].indicator #the type is defined by the type of this field (if there is no selection of type based on something, but if there is, you can test the type of the switch-on variable or maybe introduce some property to get a type), there is no need to specify it explicitly
  expected: '[0x49, 0x49]'

and it's also widely used for stuff like "construct byte array from expressions, then convert it to string

my proposal fits here (assumming serialization, chains and assertions implemented). Of course it is wery rough and needs some improvement before implementation.

GreyCat · 2018-04-08T12:07:33Z

Seems to be working now, relevant tests added to TranslatorSpec, usable for KST purposes.

GreyCat added the enhancement label Mar 8, 2018

GreyCat mentioned this issue Mar 8, 2018

Literal empty arrays #372

Closed

kaitai-io deleted a comment from GreyCat Mar 9, 2018

GreyCat self-assigned this Apr 8, 2018

GreyCat added this to the v0.9 milestone Apr 8, 2018

GreyCat closed this as completed Apr 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Literal "byte arrays" and "true arrays of integers" #371

Literal "byte arrays" and "true arrays of integers" #371

GreyCat commented Mar 8, 2018

arekbulski commented Mar 9, 2018

KOLANICH commented Mar 9, 2018 •

edited

Loading

arekbulski commented Mar 9, 2018

koczkatamas commented Mar 9, 2018 •

edited

Loading

arekbulski commented Mar 9, 2018

GreyCat commented Mar 9, 2018 •

edited by arekbulski

Loading

KOLANICH commented Mar 9, 2018 •

edited

Loading

GreyCat commented Mar 9, 2018

KOLANICH commented Mar 9, 2018 •

edited

Loading

GreyCat commented Apr 8, 2018

Literal "byte arrays" and "true arrays of integers" #371

Literal "byte arrays" and "true arrays of integers" #371

Comments

GreyCat commented Mar 8, 2018

arekbulski commented Mar 9, 2018

KOLANICH commented Mar 9, 2018 • edited Loading

arekbulski commented Mar 9, 2018

koczkatamas commented Mar 9, 2018 • edited Loading

arekbulski commented Mar 9, 2018

GreyCat commented Mar 9, 2018 • edited by arekbulski Loading

KOLANICH commented Mar 9, 2018 • edited Loading

GreyCat commented Mar 9, 2018

KOLANICH commented Mar 9, 2018 • edited Loading

GreyCat commented Apr 8, 2018

KOLANICH commented Mar 9, 2018 •

edited

Loading

koczkatamas commented Mar 9, 2018 •

edited

Loading

GreyCat commented Mar 9, 2018 •

edited by arekbulski

Loading

KOLANICH commented Mar 9, 2018 •

edited

Loading

KOLANICH commented Mar 9, 2018 •

edited

Loading