Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The if expression in value instances is not type-checked #1129

Open
Mingun opened this issue Aug 31, 2024 · 7 comments
Open

The if expression in value instances is not type-checked #1129

Mingun opened this issue Aug 31, 2024 · 7 comments

Comments

@Mingun
Copy link

Mingun commented Aug 31, 2024

Currently I only check <type>.instances.<instance>.if, but I have feeling that the same problem exists for other fields where expression is allowed.

Below the comparison between parse instance and value instance. There are many other problems with variables, but in this issue lets focus on missing checks of expression type:

Legend for the OK? column:

  • ✅ -- correct behavior, nothing is required. Variable either allowed when should or disallowed when shouldn't.
  • ❌ -- this variable should be denied in this context, but currently it is allowed at least in some cases.
  • ❓ -- variable should be allowed in some cases, or error messages are unclear, or some checks are missing. In other words, something should be corrected.

<type>.instances.<instance>.if (parse instance)

Parsed by ParseUtils.getOptValueExpression.

Test KSY:

meta:
  id: test_type
  endian:
    switch-on: 1
    cases:
      1: be
      2: le
params:
  - id: param
    type: u1
seq:
  - id: seq
    type: u1
instances:
  me:
    size: 1
    if: {} # Substitution place for the variable
  instance:
    value: 1
OK? Variable Current status Comment
param1 Allowed
seq2 Allowed If seq depends from me, then should be denied (currently it is allowed and we get error in runtime)
instance3 Allowed Analogous to seq
me4 Allowed Check is performed before the field will be read. Access to the self getter is generated which leads to infinity recursion
unknown5 Denied Error reported: unable to access 'unknown' in <type name> context. In is better to enclose <type name> in quotes
_ Denied Error reported: java.util.NoSuchElementException: None.get
_buf Denied Error reported: invalid ID: '\_buf', expected /^[a-z][a-z0-9_]*$/
_index Allowed Access to the undefined loop variable is generated (i in JS). Must be denied, because condition is checked before the loop, so even with repeat using this variable has no sense
_io Allowed
_is_le Denied Error reported: invalid ID: '\_is_le', expected /^[a-z][a-z0-9_]*$/. If <type> supports variable byte order, should be allowed
_on Denied Error reported: java.util.NoSuchElementException: None.get
_parent Allowed
_root Allowed
_sizeof Allowed Expanded to the compile-time constant of the <type>'s size

<type>.instances.<instance>.if (value instance)

Parsed by ParseUtils.getOptValueExpression.

Test KSY:

meta:
  id: test_type
  endian:
    switch-on: 1
    cases:
      1: be
      2: le
params:
  - id: param
    type: u1
seq:
  - id: seq
    type: u1
instances:
  me:
    value: 1
    if: {} # Substitution place for the variable
  instance:
    value: 1
OK? Variable Current status Comment
param1 Allowed Expression type is not checked
seq2 Allowed Expression type is not checked. Field dependencies is not checked. Should be denied if seq depends of me
instance3 Allowed Expression type is not checked. Field dependencies is not checked. Should be denied if instance depends of me
me4 Allowed Check is performed before the field will be read. Access to the self getter is generated which leads to infinity recursion
unknown5 Allowed Access to the undefined field is generated (this.unknown in JS)
_ Allowed Access to the undefined local variable is generated (_ in JS)
_buf Allowed Access to the undefined field is generated (this._buf in JS)
_index Allowed Access to the undefined loop variable is generated (i in JS). Must be denied, because condition is checked before the loop, so even with repeat using this variable has no sense
_io Allowed Expression type is not checked
_is_le Allowed Access to the undefined field is generated (this._isLe in JS). If <type> supports variable byte order, should be translated to check of the right field (this._is_le in JS)
_on Allowed Access to the undefined local variable is generated (on in JS)
_parent Allowed Expression type is not checked
_root Allowed Expression type is not checked
_sizeof Allowed Expression type is not checked. Expanded to the compile-time constant of the <type>'s size

Footnotes

  1. parameter of the enclosing type 2

  2. attribute of the enclosing type in seq section 2

  3. instance of the enclosing type in instances section 2

  4. reference to self 2

  5. non-existent attribute or instance in the enclosing type 2

@generalmimon
Copy link
Member

generalmimon commented Sep 1, 2024

To be honest, I don't see much value in these elaborate tables. In summary, type checking:

So the second "<type>.instances.<instance>.if (value instance)" table can be completely disregarded, because this is not how things usually work and there's no point in overanalyzing it. The only takeaway is that type checks are not run there. All it takes to make it look like exactly like the first table is to run the forgotten type checking on the if key of value instances as well (which is literally one line of code in KSC).

Other remarks in the first table are just repeating what has been reported before in other issues:

seq Allowed If seq depends from me, then should be denied (currently it is allowed and we get error in runtime)
me Allowed Check is performed before the field will be read. Access to the self getter is generated which leads to infinity recursion

See #685

_, _index, _is_le, _on

See #404


I don't really see anything new in what you wrote, except for the simple fact that "<type>.instances.<instance>.if (value instance)" is missing type checks (which seems to be only demonstrated by the expr_field_unknown_if_inst_value test that I added just 2 days ago, so I don't blame you for not noticing and thanks for reporting it anyway).

I thought you would already know about the 2 existing issues because you've encountered them in the past:

@generalmimon
Copy link
Member

Or let's clearly indicate that this issue only reports missing type checks on if in value instances so that it can be closed once the expr_field_unknown_if_inst_value test passes.

@generalmimon generalmimon changed the title Type of expressions in value instances are not checked The if expression in value instances is not type-checked Sep 1, 2024
@Mingun
Copy link
Author

Mingun commented Sep 1, 2024

Or let's clearly indicate that this issue only reports missing type checks on if in value instances so that it can be closed once the expr_field_unknown_if_inst_value test passes.

Yes, the intention was just about this missing check. Those tables just an extract from my work-in-progress of documenting in which contexts each special variable:

  • is accessible currently
  • should be accessible

That is why it contains more information than needed just for this issue.

The idea that with each property with expression an unique context should be assigned which determines which special variables can be used and how they should be translated. To achive that I firstly want to create test cases, check how things work currently and establish a correct behavior, because it seems that this work was never done before. Some issues created here and there but without overall picture.

@generalmimon
Copy link
Member

Yes, the intention was just about this missing check. Those tables just an extract from my work-in-progress of documenting

If you want to report one bug, open a simple issue reporting that one bug. If you want to create a far-reaching issue that attempts to solve world hunger, please do so in another issue.

Some issues created here and there but without overall picture.

Please don't tell me that you knew about the existing issues but didn't even bother to look them up and refer to them.

@Mingun
Copy link
Author

Mingun commented Sep 1, 2024

Huh?

Below the comparison between parse instance and value instance. There are many other problems with variables, but in this issue lets focus on missing checks of expression type:

following by the reproduction case and description of each variant.

Please don't tell me that you knew about the existing issues but didn't even bother to look them up and refer to them.

No, I don't. Thanks that you found related issues. The most obvious "type check" request found nothing similar to this issue.

@Mingun
Copy link
Author

Mingun commented Sep 13, 2024

Actually, if in value instances should be prohibited. I do not see any real usage for it. For the

meta:
  id: issue1129
instances:
  me:
    value: 1
    if: _

...the following JS code is generated:

  Object.defineProperty(TestType.prototype, 'me', {
    get: function() {
      if (this._m_me !== undefined)
        return this._m_me;
      if (_) {
        this._debug._m_me = {  };
        this._m_me = 1;
      }
      return this._m_me;
    }
  });

As it can be obviously seen, the if here is useless. It is allowed by this code
https://github.com/kaitai-io/kaitai_struct_compiler/blob/8610e1d7a796aad7addb64a5be2f9da48289bffc/shared/src/main/scala/io/kaitai/struct/format/InstanceSpec.scala#L42-L49
that was added in
kaitai-io/kaitai_struct_compiler@67e5a3c

and if expressions was allowed in kaitai-io/kaitai_struct_compiler@d6b4fba

@GreyCat, why you added if to value instances?

@generalmimon
Copy link
Member

@Mingun:

Actually, if in value instances should be prohibited.

No, it shouldn't. This sounds like you don't have much experience writing or perhaps also reading .ksy files (I recommend doing it more, I've always enjoyed it). Ideally, if a .ksy specification is written properly and you're parsing a binary file that follows the format that the .ksy spec is for, you should be able to invoke all value instances and not get any errors or values that are meaningless because they're not applicable in the particular situation (so not only would they be calculated unnecessarily, but mainly their values ​​could be misleading). Such errors or meaningless values would just confuse the user that inspects the object tree in visualizers, and especially nonsensical values would encourage misinterpretations when a user processes the Kaitai Struct object in their application.

What do I mean by "errors"? Using if in a value instance allows you to avoid "null pointer exceptions" when you want to use a conditional field in the value expression, e.g.:

seq:
  - id: has_foo
    type: b1
  - id: foo
    type: b7
    if: has_foo
instances:
  foo_as_signed:
    value: (foo ^ 0x40) - 0x40
    if: has_foo
    doc-ref: https://graphics.stanford.edu/~seander/bithacks.html#VariableSignExtend

Thanks to the if: has_foo on foo_as_signed, even if has_foo is false and we invoke the foo_as_signed instance, we won't get a NPE. This makes sense: foo_as_signed is derived from foo, so if foo is null, foo_as_signed should be null as well.

But it's not only about NPEs. Sometimes you want to clearly mark the instance as "not applicable" even if a value could be calculated without errors. As one example of many, take a look at zip.ksy:129-145. The specification would work even without these ifs, but then the user would have to be extra careful which values are actually meaningful and which are nonsensical - it would certainly be confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants