Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add XAR header format #473

Merged
merged 13 commits into from
Aug 25, 2021
Merged

add XAR header format #473

merged 13 commits into from
Aug 25, 2021

Conversation

armijnhemel
Copy link
Collaborator

The XAR format is an archive that consists of a header, a zlib compressed XML document and then the payload. The structure of the payload (the files, etc.) is described in the XML file. Because Kaitai Struct does not have a built in XML parser this parser is limited to just the header.

@armijnhemel armijnhemel marked this pull request as draft April 10, 2021 20:18
checksum_algorithm field 3 created with the XAR archiver referenced in
the doc-ref
@armijnhemel armijnhemel marked this pull request as ready for review April 10, 2021 20:57
@armijnhemel
Copy link
Collaborator Author

armijnhemel commented Apr 10, 2021

The version from Apple (which is in current use) has the same version number in the header but supports additional checksum algorithms. I am not sure how to wiggle these into this particular specification. Basically, it should ignore one field, but only if that field is not a valid strz. Ideally I would use some sort of "lookahead" operation, or some other conditional.

@KOLANICH
Copy link
Contributor

Because Kaitai Struct does not have a built in XML parser this parser is limited to just the header.

You can try to implement the approach I use in numpy_npy format.

archive/xar_header.ksy Outdated Show resolved Hide resolved
archive/xar_header.ksy Outdated Show resolved Hide resolved
@generalmimon
Copy link
Member

generalmimon commented Apr 11, 2021

@KOLANICH:

Because Kaitai Struct does not have a built in XML parser this parser is limited to just the header.

You can try to implement the approach I use in numpy_npy format.

@armijnhemel:

The structure of the payload (the files, etc.) is described in the XML file.

This is a very similar concept to the glTF binary format (#445), where one needs to parse JSON to be able to parse the binary structure. However, this does not mean it can't be described with the KS language - you can either outsource the XML parsing to opaque types (which I don't really recommend though, because you'll have lose any type checks / derivations that the compiler does and you'll have to resort to unsafe .as<> type casts, which are easy to screw up) or create a parametric type that can be instantiated from the application code. The application code will serve as a "bonding agent": first, it will invoke the initial parsing, access the XML string, parse it using a native or external XML parsing library of choice and then puts the needed data pulled from the XML "back" to the KSY-generated parser via the parametric types.

Which means that although the KSY won't read the whole file down to bottom (so it won't be so enjoyable to use in the Web IDE, for example), it's quite easy to use and you don't lose any type checks and you won't have to use type castings. For the glTF binary format, I have created a proof of concept for this approach, so you can see what I mean: #445 (comment)

@armijnhemel
Copy link
Collaborator Author

armijnhemel commented Apr 12, 2021

I fixed a few things. One thing I am not happy about is that currently the toc is defined as part of the header, but it shouldn't be.

I tried this:

seq:
  - id: magic
    contents: 'xar!'
  - id: len_header
    type: u2
  - id: header
    type:
      switch-on: len_header
      cases:
        28: apple_header
        _: regular_header
  - id: toc
    size: header.len_toc_compressed
    process: zlib
    doc: zlib compressed XML further describing the content of the archive

but that doesn't seem to work:

Call stack: undefined io.kaitai.struct.precompile.ErrorInInput: xar_header: /seq/3/size: don't know how to call method 'len_toc_compressed' of object type 'KaitaiStructType'

I don't know where I am going wrong there.

@KOLANICH
Copy link
Contributor

It would require interfaces proposal to implement it the way you wanted. There may be another way to implement it though

@generalmimon
Copy link
Member

@armijnhemel Sample files, both with the Apple header and the regular one?

@armijnhemel
Copy link
Collaborator Author

@armijnhemel Sample files, both with the Apple header and the regular one?

Test files with gzip, bzip2 and no compression:

https://github.com/armijnhemel/binaryanalysis-ng/blob/master/src/test/testdata/unpackers/xar/test-bzip2.xar
https://github.com/armijnhemel/binaryanalysis-ng/blob/master/src/test/testdata/unpackers/xar/test-gzip.xar
https://github.com/armijnhemel/binaryanalysis-ng/blob/master/src/test/testdata/unpackers/xar/test-none.xar

I would need to search for test files for the Apple version.

archive/xar.ksy Outdated Show resolved Hide resolved
archive/xar.ksy Outdated Show resolved Hide resolved
Copy link
Member

@generalmimon generalmimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@generalmimon generalmimon merged commit 4c4bfc0 into kaitai-io:master Aug 25, 2021
ZetaTwo pushed a commit to ZetaTwo/kaitai_struct_formats that referenced this pull request May 17, 2022
* add XAR format

* xar: document variations, add support for versions with
checksum_algorithm field 3 created with the XAR archiver referenced in
the doc-ref

* xar: differentiate between the regular header and the apple header
(probably misnomers)

* xar: use correct variable

* xar: rename from `xar_header`

* xar: add `/meta/xref/justsolve`

* xar: restructure (`header` to substream), provide cksum alg name

* xar: change `/meta/title`

* xar: improve sentence in `/doc`

* xar: update Apple Open Source GitHub mirror link

* xar: remove sentence from `/doc`, add link to Wikipedia

* xar: move "xar" file extension above "pkg"

See kaitai-io#473 (comment)

* xar: change "read" to "access" to avoid confusion

Co-authored-by: Petr Pucil <[email protected]>
ZetaTwo pushed a commit to ZetaTwo/kaitai_struct_formats that referenced this pull request May 17, 2022
* add XAR format

* xar: document variations, add support for versions with
checksum_algorithm field 3 created with the XAR archiver referenced in
the doc-ref

* xar: differentiate between the regular header and the apple header
(probably misnomers)

* xar: use correct variable

* xar: rename from `xar_header`

* xar: add `/meta/xref/justsolve`

* xar: restructure (`header` to substream), provide cksum alg name

* xar: change `/meta/title`

* xar: improve sentence in `/doc`

* xar: update Apple Open Source GitHub mirror link

* xar: remove sentence from `/doc`, add link to Wikipedia

* xar: move "xar" file extension above "pkg"

See kaitai-io#473 (comment)

* xar: change "read" to "access" to avoid confusion

Co-authored-by: Petr Pucil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants