-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InstallShield 3 Z archive and self-extracting installer formats #500
base: master
Are you sure you want to change the base?
Conversation
contents: [0x13, 0x5d, 0x65, 0x8c] | ||
- id: len_header | ||
type: u1 | ||
valid: sizeof<header> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to assumme that someone can reuse this format and extend it, so instead use size
in header
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- That is never going to happen for this format :)
- Every file I tested with has the same value in the size field, so I'm not even sure if that is the field's actual meaning. idecomp.py apparently thinks so, but I don't know where its author got that information from, or if they just guessed. So if there are any files where this field has a different value, I want the parsing to fail at first, because it's hard to say if the rest of the spec will behave properly in that case.
- The idecomp.py code also only handles this exact header size, so this check probably won't cause any problems in practice.
otherwise this field is 0 | ||
and the file's data is not split. | ||
- id: len_name | ||
type: u1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-affected-by: 84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part exactly? I'm not sure how any of the features in #84 would help with calculating "size of this type except for one of its fields"... Having _sizeof
on variable-sized fields won't make a difference, because we already have an explicit len_entry
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(property_name1, property_name2)
is equivalent to
lea(property_name2) - lea(property_name1) + sizeof(property_name2)
which is meant to spare us from summing the fields lengths explicitly ourselves
+ reserved_2._sizeof | ||
) | ||
doc: Byte size of the file name. | ||
- id: name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always a string of a particular encoding, or is it really allowed to contain arbitrary sequencies of bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a format from the pre-NT Windows era, so it probably doesn't have an encoding properly specified. In practice it's probably going to be code page 437, 850, or 1252, assuming a Western system. All of the test files I have are in English with pure ASCII file names, so it's difficult to say.
In cases like these I prefer to use raw byte arrays instead of hardcoding a generic encoding like ASCII or Latin-1, so that in case a file does contain unexpected non-ASCII characters, it can still be parsed using the KSY, and the application code can decide how to deal with the encoding issues (if at all). In my hacky Python script, I have a command-line option for selecting the encoding (defaults to ASCII), and if possible I avoid decoding the name/path fields at all.
Path name for this file, | ||
encrypted using a relatively simple algorithm. | ||
The path name can be decrypted bytewise using the formula | ||
`byte_rot_right((path_encrypted[i] ^ path_encryption_key[7-(i%8)]), 7-(i%8)) ^ path_encryption_key[i%8]`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not encryption (which implies that the algo is peer-reviewed and considered to be crypto by cryptographers, which implies that it has ever been considered secure by them, which is not the case here), it is just obfuscation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know that this is completely insecure and not a proper encryption algorithm :) But this sort of thing is still commonly called "encryption", at least outside of the context of modern cryptography. There's no risk of misunderstanding here I think (nobody is going to think that this is secure encryption), so IMO it would be more confusing to only say "obfuscation" and completely avoid the word "encryption".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "scrambling" then?
Thanks for implementing this! |
I have extracted into a separate repo and packaged Mark Adler's libblast. My repo is here, one can generate packages for Debian/Ubuntu using CPack, Fedora wasn't tested, but should also be OK. Python bindings for the decompressor using ctypes (so, for all the impls with them) are almost finished, the only thing remaining is figuring out what is wrong and fixing it (it currently complains about a wrong flag). I guess for compression I'd package the lib by Ladislav Zezula, as done by other people creating bindings to other languages, but unlike what is done by them, I guess it may make sense to wrap it into a separate python package. These are different libs, anyway. Anyway, the package should be already useful, i.e. it may be possible to create a (de)compression module for C++ target. |
re. decompression: the idecomp repo also contains a pure Python decompressor for DCL Implode compression. I just didn't spend any time integrating it into my script, because there's already a working Python-based decompressor that uses it, and because all of the files I was working with didn't use compression. |
Thanks for the info.
... which is GPLed. |
Python bindings to libblast: https://github.com/implode-compression-impls/pkblast.py |
I have fixed |
Closes #328.
Some parts of install_shield_3_z are not tested very well. I only have a single test file in the "extended" format, and no test files that use multiple parts or a password, so those parts of the spec are almost completely untested.
Similarly, I tested install_shield_3_sfx_tail only with a few installer files that I was working with anyway. There are probably other variants of the self-extracting installer data format that aren't handled by this spec.
The dos_datetime_backwards helper spec currently doesn't compile correctly to Python, because of kaitai-io/kaitai_struct#876.
The Python scripts I used for testing these specs can be found here: https://github.com/dgelessus/ksf_stuff/tree/master/archive