Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InstallShield 3 Z archive and self-extracting installer formats #500

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dgelessus
Copy link
Contributor

@dgelessus dgelessus commented Jun 24, 2021

Closes #328.

Some parts of install_shield_3_z are not tested very well. I only have a single test file in the "extended" format, and no test files that use multiple parts or a password, so those parts of the spec are almost completely untested.

Similarly, I tested install_shield_3_sfx_tail only with a few installer files that I was working with anyway. There are probably other variants of the self-extracting installer data format that aren't handled by this spec.

The dos_datetime_backwards helper spec currently doesn't compile correctly to Python, because of kaitai-io/kaitai_struct#876.

The Python scripts I used for testing these specs can be found here: https://github.com/dgelessus/ksf_stuff/tree/master/archive

contents: [0x13, 0x5d, 0x65, 0x8c]
- id: len_header
type: u1
valid: sizeof<header>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to assumme that someone can reuse this format and extend it, so instead use size in header field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That is never going to happen for this format :)
  2. Every file I tested with has the same value in the size field, so I'm not even sure if that is the field's actual meaning. idecomp.py apparently thinks so, but I don't know where its author got that information from, or if they just guessed. So if there are any files where this field has a different value, I want the parsing to fail at first, because it's hard to say if the rest of the spec will behave properly in that case.
  3. The idecomp.py code also only handles this exact header size, so this check probably won't cause any problems in practice.

otherwise this field is 0
and the file's data is not split.
- id: len_name
type: u1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-affected-by: 84

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part exactly? I'm not sure how any of the features in #84 would help with calculating "size of this type except for one of its fields"... Having _sizeof on variable-sized fields won't make a difference, because we already have an explicit len_entry field.

Copy link
Contributor

@KOLANICH KOLANICH Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(property_name1, property_name2) is equivalent to
lea(property_name2) - lea(property_name1) + sizeof(property_name2)

which is meant to spare us from summing the fields lengths explicitly ourselves

+ reserved_2._sizeof
)
doc: Byte size of the file name.
- id: name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it always a string of a particular encoding, or is it really allowed to contain arbitrary sequencies of bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a format from the pre-NT Windows era, so it probably doesn't have an encoding properly specified. In practice it's probably going to be code page 437, 850, or 1252, assuming a Western system. All of the test files I have are in English with pure ASCII file names, so it's difficult to say.

In cases like these I prefer to use raw byte arrays instead of hardcoding a generic encoding like ASCII or Latin-1, so that in case a file does contain unexpected non-ASCII characters, it can still be parsed using the KSY, and the application code can decide how to deal with the encoding issues (if at all). In my hacky Python script, I have a command-line option for selecting the encoding (defaults to ASCII), and if possible I avoid decoding the name/path fields at all.

Path name for this file,
encrypted using a relatively simple algorithm.
The path name can be decrypted bytewise using the formula
`byte_rot_right((path_encrypted[i] ^ path_encryption_key[7-(i%8)]), 7-(i%8)) ^ path_encryption_key[i%8]`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not encryption (which implies that the algo is peer-reviewed and considered to be crypto by cryptographers, which implies that it has ever been considered secure by them, which is not the case here), it is just obfuscation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know that this is completely insecure and not a proper encryption algorithm :) But this sort of thing is still commonly called "encryption", at least outside of the context of modern cryptography. There's no risk of misunderstanding here I think (nobody is going to think that this is secure encryption), so IMO it would be more confusing to only say "obfuscation" and completely avoid the word "encryption".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "scrambling" then?

@KOLANICH
Copy link
Contributor

Thanks for implementing this!

@KOLANICH
Copy link
Contributor

KOLANICH commented Jun 24, 2021

I have extracted into a separate repo and packaged Mark Adler's libblast. My repo is here, one can generate packages for Debian/Ubuntu using CPack, Fedora wasn't tested, but should also be OK. Python bindings for the decompressor using ctypes (so, for all the impls with them) are almost finished, the only thing remaining is figuring out what is wrong and fixing it (it currently complains about a wrong flag).

I guess for compression I'd package the lib by Ladislav Zezula, as done by other people creating bindings to other languages, but unlike what is done by them, I guess it may make sense to wrap it into a separate python package. These are different libs, anyway.

Anyway, the package should be already useful, i.e. it may be possible to create a (de)compression module for C++ target.

@dgelessus
Copy link
Contributor Author

re. decompression: the idecomp repo also contains a pure Python decompressor for DCL Implode compression. I just didn't spend any time integrating it into my script, because there's already a working Python-based decompressor that uses it, and because all of the files I was working with didn't use compression.

@KOLANICH
Copy link
Contributor

decompression: the idecomp repo also contains a pure Python decompressor for DCL Implode compression.

Thanks for the info.

I just didn't spend any time integrating it into my script, because there's already a working Python-based decompressor that uses it

... which is GPLed.

@KOLANICH
Copy link
Contributor

KOLANICH commented Jun 28, 2021

@KOLANICH
Copy link
Contributor

I have fixed pkimplode.py, tested it, added tests to kaitai.compress implode compressor and sent a few PRs there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

InstallShield archive format version 3
2 participants