sizeof and lea operations #84

KOLANICH · 2017-01-15T00:18:47Z

It'd be nice to have sizeof operators for getting amount of space occupied by some set of structures and lea for getting offset of properties to have relative addressing modes.
The specification is:
lea() returns an offset of the struct making the scope. It is the stream position where the value of this type have been started parsing.
lea(property_name) returns an offset of the property which name is passed. It is either the stream position where the property have been started parsing, or predicted using known sizes of already parsed fields.
sizeof() returns size of the struct making the scope known by the moment. If the size is unknown - compile error. The size is known when any of the following is satisfied:

the type is fixed size
property size is stored in some variable and can be accessed and modified (modification can have serialization implications like truncating of arrays)
all the subproperties have known size
sizeof(property_name) returns a size of the property which name has been passed.
sizeof(property_name1, property_name2) is equivalent to
lea(property_name2) - lea(property_name1) + sizeof(property_name2) and lea(property_name2) >= lea(property_name1) must satisfy
The example is

seq:
 - id: str
   type: strz
   encoding: ASCII
 - id: revision
   type: u1
 - id: len
   type: u1
 - id: struct_offset0
   type: u2
 - id: struct_offset1
   type: u2
 - id: data
   size: len - sizeof(str, struct_offset1)  #equivalently len - sizeof(str) - sizeof(revision) - sizeof(len) -sizeof(struct_offset0) - sizeof(struct_offset1)
instances:
  str0:
    type: superstruct
    pos: lea(data)+struct_offset0
  str1:
    type: superstruct
    pos: lea(data)+struct_offset1

It creates a struct of size len with variable-length string in the beginning. Then it places a 2 superstructs in data relative to beginning of data. Variable-length string is taken as an example of a case where sizes and offsets are not defined on compile time.

The text was updated successfully, but these errors were encountered:

KOLANICH · 2017-08-20T17:57:40Z

@GreyCat, when can I expect this landed? I'm implementing a ksy for a format, where a payload is in the middle of fixed-size fields, and the size of that payload is defined by container. Surely we cannot use "size: eos" here.

koczkatamas · 2017-08-21T00:07:11Z

Could you show the file format? I think this can be worked around by splitting the type into two different subtypes.

KOLANICH · 2017-08-21T04:50:17Z

Here is the spec for payload. It is a separate type, which size is defined by the field size it is placed in. It can be placed into different types, so I dislike to use the hacks like _parent.payload_len. In python source the size is also calculated from the len of payload buffer .

GreyCat · 2017-08-21T07:39:54Z

There are many different proposals here, so let's break it into some parts. First of all, sizeof. I totally agree that we need some functionality akin to C-like sizeof, but there are some important differences in KS vs traditional C-like implementation:

Not every structure and/or data type in KS would have a definite fixed size, unlike C
In C, there are actually two different kinds of sizeof operator: doing stuff like sizeof some_var vs sizeof char. First kind takes value expression as argument, second one takes types (actually, type expressions) as argument. Your proposal rotates around the first kind, and even enhances it more, but, actually, in KS, even that could result in some undefined behavior, i.e. when asking for stuff like sizeof(1), sizeof(some_member + 1), sizeof(some_member * 2.0), sizeof(x) where x is type: b26, sizeof(true), etc.
We don't have global functions, and I don't really like the idea of introducing them. Actually, even languages like C implement sizeof as a special operator due to its duality and special compile-time definition.

So, I can propose that we'd start from the basic sizeof. I can see that we could implement 2 basic forms:

using type name (dynamic type size, non-user type => error)
using expression (calculated expression type, boolean type, non-fixed type size, IO stream type => error)

Given that we support addressing in both bytes and bits, probably we'd need two implementations of these, like bitsize and bytesize.

I have an relatively easy syntax proposal for "using expression": do something like some_stuff_here._size and some_stuff_here._bits — akin to existing size and proposed bits (see #112) YAML keys.

I have no idea right now on how to implement type-based sizeof operator, except for introducing yet another keyword, which is probably doable, but definitely not very pretty...

KOLANICH · 2017-08-21T12:31:25Z

Do we really need type-based syntax? I mean if the field of the type is never used, we don't need its size, if it is used, we can use expression-based syntax.

Updated the proposal a bit with impl details.

GreyCat · 2017-08-21T20:55:14Z

Do we really need type-based syntax? I mean if the field of the type is never used, we don't need its size, if it is used, we can use expression-based syntax.

Well, yes and no. Sometimes it's just very inconvenient to go through all that _parent._parent.blah.blah.blah stuff to name a particular element to take size of. Sometimes (and that's yet another question to decide), you want the size of basic structure by itself, and your only application of a structure is array, i.e.:

seq:
  - id: things
    type: thing
    repeat: eos

things._size would probably return size of whole array, but you want the size of one individual element.

GreyCat · 2017-08-23T19:27:11Z

I've just committed preliminary implementation of precompile stage that calculated fixed size of all types and members, if that's possible.

This already improved Graphviz output a lot: now it correctly prints out offsets/sizes table even for one fixed sized structure embedded into the other, i.e. stuff like BMP, Blender .blend, cramfs, etc.

Also, it obviously opens the way for sizeof / lea implementation.

bsagal · 2018-11-28T12:26:19Z

If the type is not fixed size, could you add the possibility of getting the min and max size of the type

KOLANICH · 2018-11-28T13:21:09Z

@bsagal, it is a more complex derivation. Could you show use cases for it when the actual size used by a field is not enough?

bsagal · 2018-11-28T13:48:55Z

@KOLANICH I have non fixed size structs and would like to add logic to check that the size of the steam is valid before starting the parsing

KOLANICH · 2018-11-28T13:57:58Z

I have non fixed size structs and would like to add logic to check that the size of the steam is valid before starting the parsing

I guess it should be done by the code generated by KSC, not by a ksy dev.

GreyCat · 2019-04-25T12:08:04Z

Implemented type-based sizeof in byte/bit flavors, as special keyword operator:

sizeof<some_type>
bitsizeof<some_type>
sizeof<foo::bar::baz>
sizeof<foo::bar::baz>

See expr_sizeof_type_1 for example.

GreyCat · 2019-04-25T12:21:45Z

Also implemented value-based sizeof, using virtual _sizeof attribute/operator: see expr_sizeof_value_0 for examples.

KOLANICH · 2019-04-25T17:38:43Z

Implemented type-based sizeof in byte/bit flavors, as special keyword operator
Also implemented value-based sizeof, using virtual _sizeof attribute/operator: see expr_sizeof_value_0 for examples.

Thank you.

generalmimon · 2020-05-02T22:26:41Z

@KOLANICH I have a question regarding your initial example in #84 (comment). The first field str in your seq is the variable-length byte-terminated string, and you're apparently applying sizeof operator to it while calling sizeof(str, struct_offset1).

Does it imply that you don't want the sizeof operators to just yield a compile-time constant, but generate expressions that will be calculated at runtime as well (as I described in #721 (comment))? It seems like that, your last sentence confirms it:

Variable-length string is taken as an example of a case where sizes and offsets are not defined on compile time.

But then it seems to me that you're contradicting yourself. I think terminator delimited string doesn't meet the condition of known size:

If the size is unknown - compile error. The size is known when any of the following is satisfied:

the type is fixed size

size attr is used in property

all the subproperties have known size

If the sizeof operator is expected to work even if the size is derivable only at runtime, I recommend using the _io.pos attribute (#721 (comment)). The null-terminated string is actually a great example of when this approach really shines. We don't have any size key available that would tell us the byte size of the str field. We can just get the character length of the string, but the string is of course encoding-affected, so we can't just use it as the byte size as we can't expect that all encodings use 1 character per 1 byte (in fact, they are usually not). So the only way would be to convert the string back to bytes using the original encoding, but that might be quite an expensive operation, you wouldn't want to do that.

Querying _io.pos before and after is clean and simple, and it always works, no matter what's been parsed between the calls. It's the same approach as what we're used to do for time measurement: startTime = now(); /* ... */ print(now() - startTime);.

All we'd have to do is to collect the fields that are used in sizeof operator from anywhere in the KSY and inserting the _io.pos queries before and after the method call that reads them. This could be in both the _read() method for seq attributes and in the methods that read parse instances. The calculated sizes will be stored in the struct's properties after finding out. Like this:

public void _read() {
    int _ofsStr = this._io.pos();
    this.str = new String(this._io.readBytesTerm(0, false, true, true), Charset.forName("ASCII"));
    this._sizeofStr = this._io.pos() - _ofsStr;
}
private Integer _sizeofStr;
public Integer _sizeofStr() { return _sizeofStr; }

Actually, we can use this approach on all value-based _sizeofs and it would be perfectly valid, no need for compile-time calculations. The compile-time calculations are more likely to be wrong, if they're not careful about bit-byte alignment, if conditions, repetitions, type-switching etc.

And using _io.pos would make the lea implementation much easier as well - when requesting the offset of the last field of the seq, we hadn't to sum all preceding field sizes, it could be done with a single _io.pos call.

KOLANICH · 2020-06-10T08:48:07Z

@generalmimon, yes, it is meant to generate something that gives us byte size (not code point surely) (or maybe struct our_size {uint64_t byte; uint8_t bit; ....}; object), in compile time if it is possible, if not - in run time. I have updated the top message for more clarity. If I remember right, the current impl of terminated stuff scans the stream for terminator, then puts the content into a separate stream, then parses the stream, so the size is known beforehand and is stored somewhen within stream object. We surely shouldn't done strlen on each call.

generalmimon · 2020-08-02T15:38:50Z

I'm taking this out of the 0.9 milestone. This feature currently works only for compile-time calculations, so it's not yet finished, but I'd say that's fine for purposes of 0.9 version. We can always improve it in the future.

GreyCat added the enhancement label Feb 21, 2017

LogicAndTrick mentioned this issue Feb 28, 2017

New repeat form "size" #119

Open

GreyCat mentioned this issue Jun 28, 2017

can't parse file from END to beginning known it's size #189

Closed

KOLANICH mentioned this issue Sep 27, 2017

Optional fields #273

Open

GreyCat added this to the v0.9 milestone Feb 2, 2018

KOLANICH mentioned this issue Mar 20, 2018

Construct export tool (a compiler target) #377

Open

KOLANICH mentioned this issue Mar 10, 2019

Inconsistent naming of size/length methods #527

Open

GreyCat mentioned this issue Mar 28, 2019

macOS '.DS_Store' files kaitai-io/kaitai_struct_formats#131

Merged

This was referenced Mar 1, 2020

A rant/suggestion about pos instances and substreams #709

Open

MOV renamed to ISO/IEC and more boxes/atoms are parsed kaitai-io/kaitai_struct_formats#272

Open

generalmimon mentioned this issue Mar 28, 2020

_sizeof does not work on array fields #721

Open

generalmimon mentioned this issue Jul 3, 2020

Add nitf kaitai-io/kaitai_struct_formats#294

Merged

generalmimon removed this from the v0.9 milestone Aug 2, 2020

generalmimon added this to the v0.10 milestone Aug 2, 2020

generalmimon mentioned this issue Aug 21, 2020

Added parchive v2 kaitai-io/kaitai_struct_formats#69

Closed

generalmimon mentioned this issue Oct 28, 2020

Add Android super.img (Dynamic Partitions) format kaitai-io/kaitai_struct_formats#349

Merged

generalmimon mentioned this issue Jan 31, 2021

Add battery_management_system_protocol kaitai-io/kaitai_struct_formats#410

Open

generalmimon mentioned this issue Mar 12, 2021

add .ksy for the GIMP brush file format kaitai-io/kaitai_struct_formats#427

Merged

This was referenced Mar 29, 2021

Creating Substreams of unknown size #868

Closed

add Android sparse format kaitai-io/kaitai_struct_formats#460

Merged

generalmimon mentioned this issue Sep 12, 2022

Fix dbf.ksy kaitai-io/kaitai_struct_formats#621

Merged

Omar-Abdul-Azeez mentioned this issue Jan 14, 2023

Is it possible to store the current parse index only? #378

Open

generalmimon referenced this issue in filestar/coreldraw_cdr.ksy Feb 24, 2023

New stuff

94ccfad

GreyCat mentioned this issue Jul 14, 2023

How to get the size of the read record and its source buffer for crc validation? #1046

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sizeof and lea operations #84

sizeof and lea operations #84

KOLANICH commented Jan 15, 2017 •

edited

Loading

KOLANICH commented Aug 20, 2017 •

edited

Loading

koczkatamas commented Aug 21, 2017

KOLANICH commented Aug 21, 2017 •

edited

Loading

GreyCat commented Aug 21, 2017

KOLANICH commented Aug 21, 2017 •

edited

Loading

GreyCat commented Aug 21, 2017

GreyCat commented Aug 23, 2017

bsagal commented Nov 28, 2018

KOLANICH commented Nov 28, 2018

bsagal commented Nov 28, 2018

KOLANICH commented Nov 28, 2018

GreyCat commented Apr 25, 2019

GreyCat commented Apr 25, 2019

KOLANICH commented Apr 25, 2019

generalmimon commented May 2, 2020 •

edited

Loading

KOLANICH commented Jun 10, 2020

generalmimon commented Aug 2, 2020

sizeof and lea operations #84

sizeof and lea operations #84

Comments

KOLANICH commented Jan 15, 2017 • edited Loading

KOLANICH commented Aug 20, 2017 • edited Loading

koczkatamas commented Aug 21, 2017

KOLANICH commented Aug 21, 2017 • edited Loading

GreyCat commented Aug 21, 2017

KOLANICH commented Aug 21, 2017 • edited Loading

GreyCat commented Aug 21, 2017

GreyCat commented Aug 23, 2017

bsagal commented Nov 28, 2018

KOLANICH commented Nov 28, 2018

bsagal commented Nov 28, 2018

KOLANICH commented Nov 28, 2018

GreyCat commented Apr 25, 2019

GreyCat commented Apr 25, 2019

KOLANICH commented Apr 25, 2019

generalmimon commented May 2, 2020 • edited Loading

KOLANICH commented Jun 10, 2020

generalmimon commented Aug 2, 2020

KOLANICH commented Jan 15, 2017 •

edited

Loading

KOLANICH commented Aug 20, 2017 •

edited

Loading

KOLANICH commented Aug 21, 2017 •

edited

Loading

KOLANICH commented Aug 21, 2017 •

edited

Loading

generalmimon commented May 2, 2020 •

edited

Loading