-
Notifications
You must be signed in to change notification settings - Fork 17
Structure analyzer
DataStruct analyzer is a powerful tool to inspect and modify structures.
It allows to parse data in file using C-like structure description.
You can apply structure description to entire file or only selected block. Hextor will parse specified data and show structure tree. You can then click tree node to show it's position in file or double-click it to change field value. Structure tree view is automatically updated when you change data in editor.
Structure that is shown at "Struct" tool tab is also available to scripts under the name "ds".
For example, if you have parsed some BMP file into structure, you can access and modify it's fields like this:
ds.bmiHeader.biXPelsPerMeter = ds.bmiHeader.biXPelsPerMeter * 2;
Structure description is similar to type definition in C/C++, with some additions and limitations.
Description consists of statements.
Each statement is either:
- Type name and a list of field names (Example:
int a, b;
)
(type may be one of basic types or a structure) - Conditional construction like "if"/"switch"
- Directive
- User type definition (typedef)
- JScript statement (for example, assignment to helper variable). Everything that structure parser cannot recognize as a valid DataStruct statement, it tries to execute as JScript.
Statements must be separated with semicolon, except for directives. Directive is always ended at line break.
data_type names_list;
Examples:
char a;
int x, y;
{
statements
} names_list;
Examples:
{
char R, G, B;
char alpha;
} color1, color2;
Hextor allows nameless structures. Usually they are used with conditional constructions like "if"/"switch" (see below).
data_type name[count];
Where:
data_type can be either one of basic types or a structure
count is a script expression that will be evaluated when Hextor begins to interpret this field in file.
In count expression, you can use already parsed fields.
Three additional virtual fields are defined for arrays:
index is equal to 0-based index of currently parsed item of this array. It can be used during structure parsing, see examples below.
length is equal to length of array. In scripts, you can also use length to change the length of array if it has fixed-size elements (see below).
size is total array size in bytes. For example, it can be used to abort array parsing when its size reaches known limit, defined by previous fields.
Examples:
int x[10], y[10];
{
uint8 r, g, b;
} colors[3];
int cnt;
byte x[cnt];
DWORD section_size; // Total size of all chunks
{
ansi chunk_type[4];
DWORD chunk_size;
char data[chunk_size];
if (chunks.size >= section_size) break; // End of section
} chunks[];
int blocks_count;
{
int data_type;
int data_size; // Size of corresponding data_block in bytes
} headers[blocks_count];
{
uint16 data[headers[data_blocks.index].data_size / 2]; // Each data_block's data size is defined in corresponding header
} data_blocks[blocks_count];
Possible data for this structure:
Changing array size
After a structure with array is parsed, you can use scripts to modify data and change array length. Assigning array's length field adds or removes elements at the end of array.
For example, if your data structure is:
int len;
byte data[len];
You can add more elements by running script:
ds.data.length = ds.len + 1; // Insert element
ds.data[ds.len] = 123; // New elements are filled with zero by default
ds.len = ds.data.length; // Array size indicator in original data should be modified explicitly
Lazy array loading
If elements of array are of fixed size (for example, one of basic types or a structures which do not contain variable-sized arrays), actual data of array is not read from file during structure parsing, but is read only when array elements are actually used (viewed in GUI or accessed from scripts).
This allows to quickly parse even large files (hundreds of megabytes) if they contain large sections with simple internal structure (e.g. byte arrays).
Currently, Hextor allocates an array of pointers for actual array elements when you first access array items above first 1024 indices; this requires 8 bytes of memory for every element. This may be improved in future versions.
if (condition) statement;
if (condition) statement1
else statement2;
condition is a script expression yielding a boolean value.
statement is any valid statement, including structures and another conditional constructions.
Examples:
byte version;
int x, y;
if (version >= 2)
word flags;
{
byte record_type;
byte record_size;
if (record_type == 2) { // record with known structure
int x, y;
char unknown[record_size - 8]; // just in case if structure has more fields, we rely on specified size
}
else { // all other records
char data[record_size];
}
}
Technically, block of statements enclosed in {} is a structure. In case of conditional statements, nameless structures are often used just to combine several statements. Nameless structure must be followed by ";", "else" or "}", otherwise next lexem will be interpreted as structure name.
if (a>1) {
int b;
}
int c; // Error here - "int" is treated as structure name. Semicolon after "}" is needed.
Such nameless structures are flattened in tree view.
switch (expression) {
case constant_list_1: statement1;
case constant_list_2: statement2;
default: statement3;
}
expression is a script expression yielding a number or a character value.
constant_list defines a set of values (numbers or characters enclosed in single quotes). Set is written as comma-separated list of constants and ranges in form value1..value2. Types of expression and constants in list must match.
There is no fallthrough or "break" statements. "default" case is optional.
Examples:
{
uint16_t msg_size;
ansi msg_type;
} header;
switch (header.msg_type) {
case 'F': ansi data[header.msg_size];
case 'A': {
uint8_t multi_id;
uint16_t msg_id;
ansi message_name[header.msg_size-3];
};
default: uint8 unknown[header.msg_size];
}
byte rec_type;
switch (rec_type) {
case 0: char a;
case 1, 3, 5..10: word b;
case 2, 4: int c;
}
break command, when used inside an array of structures, interrupts array parsing. For example, following code parses a list of strings until a zero-length string is found.
{
byte len;
if (len == 0) break;
ansi str[len];
} items[];
continue command, when used inside an array of structures, skips the rest of current structure and begins parsing next array element.
{
byte len;
if (len == 0) continue;
ansi str[len];
int32 type;
} items[];
var identifier = expression;
Creates a temporary local variable that can be used in expressions, but does not itself exists in the file.
Example:
byte version;
var items_count = (version >= 2) ? 5 : 3;
byte ivalues[items_count];
float fvalues[items_count];
typedef type_definition names_list
type_definition is either:
- One of basic data types
- Structure enclosed in "{ }"
Examples:
typedef uint8_t triplet[3];
typedef {
char len;
ansi chars[len];
} string;
// These types can be used later in structure
triplet clr;
string caption;
Directives start with "#" and end at line break. They affect either previous or next statements (depends on directive).
Currently, following directives are supported:
#addr expression
Jump to specified (absolute) address in file. Next field will start from this address.
Example:
// This is from PE file structure description.
// Address of each section in file is stored in "PointerToRawData" field of
// corresponding section header.
{
#addr image_section_header[sections.index].PointerToRawData
byte data[image_section_header[sections.index].SizeOfRawData];
} sections[image_file_header.NumberOfSections];
cur_addr built-in variable contains current absolute address in file and can be used to seek forward or backward, for example:
#addr cur_addr-1
#align value
Aligns all basic fields in a structure to not cross specified boundary, similar to "#pragma pack(value)".
value should be a numeric constant.
Example:
{
{
#align 4
char a, b;
// There will be 2 unused bytes before x
int32 x;
} subrecord;
char n;
int32 m; // These are not aligned
} rec[10];
#align directive can be located anywhere inside structure description, not only at start. Anyway it affects entire structure in which it is defined and all it's sub-structures (but not it's parent structure).
Fields are aligned relative to their parent structure starting address. To align structure itself, use #align_pos directive.
#align_pos expression
Align next field by specified count of bytes relative from current structure start.
If used at the end of structure, forces it's size to align upwards to specified boundary.
Supports script expressions.
Example:
// This is from BMP file structure description.
// Each row is padded to align by 4 bytes
{
uint8 data[Math.floor((bmiHeader.biWidth*bmiHeader.biBitCount-1)/8)+1];
// Align rows to 4 bytes
#align_pos 4
} rows[Math.abs(bmiHeader.biHeight)];
#bigendian
#littleendian
Defines endianness of all following fields. Default is little endian.
Example:
int a; // this field is little endian
#bigendian
int b, c; // these fields are big endian
#format notation
Defines display format for a field(s) in previous statement. Currently, following formats are supported: bin
dec
hex
.
Examples:
dword offset, size; #format hex
#hint expression
Display custom hint in a structure viewer for this field. Expression is a JScript expression that can refer to other fields and helper variables. This can be used for user-defined interpretation of data.
Examples:
uint32 tm; #hint new Date((tm + 1262304000)*1000)
// tm contains seconds since 2010-01-01. To get Unixtime, add 1262304000
See also structures in VarLenInt built-in examples.
#itemhint expression
If used after an array definition, sets a hint for individual array items (instead of entire array object). See #hint
.
#size expression
If used after an array with unspecified length, then actual element count is determined by given data size.
Examples:
DWORD section_size; // Total size of all chunks
{
ansi chunk_type[4];
DWORD chunk_size;
char data[chunk_size];
} chunks[]; #size section_size // Chunks count is determined by section size
#valid constant_list
Defines acceptable value or a set of values for a field(s) in previous statement. Fields whose values do not fall into the valid set are highlighted in the resulting structure.
constant_list is written as comma-separated list of constants and ranges in form value1..value2.
Examples:
uint16 magic; #valid 0x4D42 // "BM"
byte version; #valid 1, 3..5
int width, height; #valid 0..1023 // Applies to both fields
Currently, #valid
for a character array like "BM" is not supported. Only for numbers and for single characters.