Add InstallShield 3 Z archive and self-extracting installer formats #500

dgelessus · 2021-06-24T01:42:47Z

Closes #328.

Some parts of install_shield_3_z are not tested very well. I only have a single test file in the "extended" format, and no test files that use multiple parts or a password, so those parts of the spec are almost completely untested.

Similarly, I tested install_shield_3_sfx_tail only with a few installer files that I was working with anyway. There are probably other variants of the self-extracting installer data format that aren't handled by this spec.

The dos_datetime_backwards helper spec currently doesn't compile correctly to Python, because of kaitai-io/kaitai_struct#876.

The Python scripts I used for testing these specs can be found here: https://github.com/dgelessus/ksf_stuff/tree/master/archive

Closes kaitai-io#328.

KOLANICH · 2021-06-24T04:20:16Z

archive/install_shield_3_z.ksy

+    contents: [0x13, 0x5d, 0x65, 0x8c]
+  - id: len_header
+    type: u1
+    valid: sizeof<header>


Why not to assumme that someone can reuse this format and extend it, so instead use size in header field?

That is never going to happen for this format :)

Every file I tested with has the same value in the size field, so I'm not even sure if that is the field's actual meaning. idecomp.py apparently thinks so, but I don't know where its author got that information from, or if they just guessed. So if there are any files where this field has a different value, I want the parsing to fail at first, because it's hard to say if the rest of the spec will behave properly in that case.

The idecomp.py code also only handles this exact header size, so this check probably won't cause any problems in practice.

archive/install_shield_3_z.ksy

KOLANICH · 2021-06-24T04:26:43Z

archive/install_shield_3_z.ksy

+          otherwise this field is 0
+          and the file's data is not split.
+      - id: len_name
+        type: u1


-affected-by: 84

Which part exactly? I'm not sure how any of the features in #84 would help with calculating "size of this type except for one of its fields"... Having _sizeof on variable-sized fields won't make a difference, because we already have an explicit len_entry field.

sizeof(property_name1, property_name2) is equivalent to
lea(property_name2) - lea(property_name1) + sizeof(property_name2)

which is meant to spare us from summing the fields lengths explicitly ourselves

KOLANICH · 2021-06-24T04:29:13Z

archive/install_shield_3_z.ksy

+              + reserved_2._sizeof
+            )
+        doc: Byte size of the file name.
+      - id: name


is it always a string of a particular encoding, or is it really allowed to contain arbitrary sequencies of bytes?

This is a format from the pre-NT Windows era, so it probably doesn't have an encoding properly specified. In practice it's probably going to be code page 437, 850, or 1252, assuming a Western system. All of the test files I have are in English with pure ASCII file names, so it's difficult to say.

In cases like these I prefer to use raw byte arrays instead of hardcoding a generic encoding like ASCII or Latin-1, so that in case a file does contain unexpected non-ASCII characters, it can still be parsed using the KSY, and the application code can decide how to deal with the encoding issues (if at all). In my hacky Python script, I have a command-line option for selecting the encoding (defaults to ASCII), and if possible I avoid decoding the name/path fields at all.

KOLANICH · 2021-06-24T04:40:46Z

archive/install_shield_3_sfx_tail.ksy

+          Path name for this file,
+          encrypted using a relatively simple algorithm.
+          The path name can be decrypted bytewise using the formula
+          `byte_rot_right((path_encrypted[i] ^ path_encryption_key[7-(i%8)]), 7-(i%8)) ^ path_encryption_key[i%8]`,


It is not encryption (which implies that the algo is peer-reviewed and considered to be crypto by cryptographers, which implies that it has ever been considered secure by them, which is not the case here), it is just obfuscation.

Yeah, I know that this is completely insecure and not a proper encryption algorithm :) But this sort of thing is still commonly called "encryption", at least outside of the context of modern cryptography. There's no risk of misunderstanding here I think (nobody is going to think that this is secure encryption), so IMO it would be more confusing to only say "obfuscation" and completely avoid the word "encryption".

maybe "scrambling" then?

archive/install_shield_3_sfx_tail.ksy

KOLANICH · 2021-06-24T04:44:33Z

Thanks for implementing this!

KOLANICH · 2021-06-24T19:23:30Z

I have extracted into a separate repo and packaged Mark Adler's libblast. My repo is here, one can generate packages for Debian/Ubuntu using CPack, Fedora wasn't tested, but should also be OK. Python bindings for the decompressor using ctypes (so, for all the impls with them) are almost finished, the only thing remaining is figuring out what is wrong and fixing it (it currently complains about a wrong flag).

I guess for compression I'd package the lib by Ladislav Zezula, as done by other people creating bindings to other languages, but unlike what is done by them, I guess it may make sense to wrap it into a separate python package. These are different libs, anyway.

Anyway, the package should be already useful, i.e. it may be possible to create a (de)compression module for C++ target.

dgelessus · 2021-06-25T01:11:33Z

re. decompression: the idecomp repo also contains a pure Python decompressor for DCL Implode compression. I just didn't spend any time integrating it into my script, because there's already a working Python-based decompressor that uses it, and because all of the files I was working with didn't use compression.

KOLANICH · 2021-06-25T05:19:55Z

decompression: the idecomp repo also contains a pure Python decompressor for DCL Implode compression.

Thanks for the info.

I just didn't spend any time integrating it into my script, because there's already a working Python-based decompressor that uses it

... which is GPLed.

KOLANICH · 2021-06-28T22:50:35Z

Python bindings to libblast: https://github.com/implode-compression-impls/pkblast.py
kaitai.compress compressor: https://github.com/kaitaiStructCompile/kaitai_compress/blob/python_fixes/python/kaitai/compress/algorithms/implode.py
pkimplode.py (python bindings to pklib libimplode) https://github.com/implode-compression-impls/pkimplode.py

KOLANICH · 2021-10-27T19:58:37Z

I have fixed pkimplode.py, tested it, added tests to kaitai.compress implode compressor and sent a few PRs there.

dgelessus added 2 commits June 24, 2021 03:26

Add InstallShield 3 Z archive and self-extracting installer formats

5df403a

Closes kaitai-io#328.

Add more doc-ref links to install_shield_3_z.ksy

d10cdfc

KOLANICH reviewed Jun 24, 2021

View reviewed changes

archive/install_shield_3_z.ksy Outdated Show resolved Hide resolved

KOLANICH reviewed Jun 24, 2021

View reviewed changes

archive/install_shield_3_sfx_tail.ksy Show resolved Hide resolved

dgelessus added 4 commits June 25, 2021 00:38

Fix missing license info in InstallShield 3 specs

9649a9a

Reorder install_shield_3_sfx_tail meta to match style guide

403be69

Add more metadata to InstallShield 3 specs

0b98a38

Split flag fields in install_shield_3_z into their own subtypes

61ba2c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InstallShield 3 Z archive and self-extracting installer formats #500

Add InstallShield 3 Z archive and self-extracting installer formats #500

dgelessus commented Jun 24, 2021 •

edited

Loading

KOLANICH Jun 24, 2021

dgelessus Jun 25, 2021

KOLANICH Jun 24, 2021

dgelessus Jun 25, 2021

KOLANICH Jun 25, 2021 •

edited

Loading

KOLANICH Jun 24, 2021

dgelessus Jun 25, 2021

KOLANICH Jun 24, 2021

dgelessus Jun 25, 2021

KOLANICH Jun 25, 2021

KOLANICH commented Jun 24, 2021

KOLANICH commented Jun 24, 2021 •

edited

Loading

dgelessus commented Jun 25, 2021

KOLANICH commented Jun 25, 2021

KOLANICH commented Jun 28, 2021 •

edited

Loading

KOLANICH commented Oct 27, 2021

Add InstallShield 3 Z archive and self-extracting installer formats #500

Are you sure you want to change the base?

Add InstallShield 3 Z archive and self-extracting installer formats #500

Conversation

dgelessus commented Jun 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KOLANICH Jun 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KOLANICH commented Jun 24, 2021

KOLANICH commented Jun 24, 2021 • edited Loading

dgelessus commented Jun 25, 2021

KOLANICH commented Jun 25, 2021

KOLANICH commented Jun 28, 2021 • edited Loading

KOLANICH commented Oct 27, 2021

dgelessus commented Jun 24, 2021 •

edited

Loading

KOLANICH Jun 25, 2021 •

edited

Loading

KOLANICH commented Jun 24, 2021 •

edited

Loading

KOLANICH commented Jun 28, 2021 •

edited

Loading