From d8944be2fd9e2ce96d4fe26188a1faed738a9d57 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Mon, 19 Nov 2018 19:12:05 -0600 Subject: [PATCH 01/14] Add rekordbox file format specs. These have been proven to work in the context of my Beat Link Trigger project, enabling it to retrieve the database over NFS, parse it, and extract all the track metadata it needs even when it is impossible to connect to the database server running on the players because there are four of them in use. --- database/pdb.ksy | 961 +++++++++++++++++++++++++++++++++++++++++++++++ media/anlz.ksy | 302 +++++++++++++++ 2 files changed, 1263 insertions(+) create mode 100644 database/pdb.ksy create mode 100644 media/anlz.ksy diff --git a/database/pdb.ksy b/database/pdb.ksy new file mode 100644 index 000000000..110567549 --- /dev/null +++ b/database/pdb.ksy @@ -0,0 +1,961 @@ +meta: + id: pdb_file + title: DeviceSQL database export (probably generated by rekordbox) + application: rekordbox + file-extension: + - pdb + license: EPL-1.0 + endian: le + +doc: | + This is a relational database format designed to be efficiently used + by very low power devices (there were deployments on 16 bit devices + with 32K of RAM). Today you are most likely to encounter it within + the Pioneer Professional DJ ecosystem, because it is the format that + their rekordbox software uses to write USB and SD media which can be + mounted in DJ controllers and used to play and mix music. + + It has been reverse-engineered to facilitate sophisticated + integrations with light and laser shows, videos, and other musical + instruments, by supporting deep knowledge of what is playing and + what is coming next through monitoring the network communications of + the players. + + The file is divided into fixed-size blocks. The first block has a + header that establishes the block size, and lists the tables + available in the database, identifying their types and the index of + the first of the series of linked pages that make up that table. + + Each table is made up of a series of rows which may be spread across + any number of pages. The pages start with a header describing the + page and linking to the next page. The rest of the page is used as a + heap: rows are scattered around it, and located using an index + structure that builds backwards from the end of the page. Each row + of a given type has a fixed size structure which links to any + variable-sized strings by their offsets within the page. + + As changes are made to the table, some records may become unused, + and there may be gaps within the heap that are too small to be used + by other data. There is a bit map in the row index that identifies + which rows are actually present. Rows that are not present must be + ignored: they do not contain valid (or even necessarily well-formed) + data. + + The majority of the work in reverse-engineering this format was + performed by @henrybetts and @flesniak, for which I am hugely + grateful. @GreyCat helped me learn the intricacies (and best + practices) of Kaitai far faster than I would have managed on my own. + +doc-ref: https://github.com/Deep-Symmetry/crate-digger/blob/master/doc/Analysis.pdf + +seq: + - contents: [0, 0, 0, 0] + - id: len_page + type: u4 + doc: | + The database page size, in bytes. Pages are referred to by + index, so this size is needed to calculate their offset, and + table pages have a row index structure which is built from the + end of the page backwards, so finding that also requires this + value. + - id: num_tables + type: u4 + doc: | + Determines the number of table entries that are present. Each + table is a linked list of pages containing rows of a particular + type. + - id: next_unused_page + type: u4 + doc: | + @flesinak said: "Not used as any `empty_candidate`, points + past the end of the file." + - type: u4 + - id: sequence + type: u4 + doc: | + @flesniak said: "Always incremented by at least one, + sometimes by two or three." + - contents: [0, 0, 0, 0] + - id: tables + type: table + repeat: expr + repeat-expr: num_tables + doc: | + Describes and links to the tables present in the database. + +types: + table: + doc: | + Each table is a linked list of pages containing rows of a single + type. This header describes the nature of the table and links to + its pages by index. + seq: + - id: type + type: u4 + enum: page_type + doc: | + Identifies the kind of rows that are found in this table. + - id: empty_candidate + type: u4 + - id: first_page + type: page_ref + doc: | + Links to the chain of pages making up that table. The first + page seems to always contain similar garbage patterns and + zero rows, but the next page it links to contains the start + of the meaningful data rows. + - id: last_page + type: page_ref + doc: | + Holds the index of the last page that makes up this table. + When following the linked list of pages of the table, you + either need to stop when you reach this page, or when you + notice that the `next_page` link you followed took you to a + page of a different `type`. + -webide-representation: '{type}' + + page_ref: + doc: | + An index which points to a table page (its offset can be found + by multiplying the index by the `page_len` value in the file + header). This type allows the linked page to be lazy loaded. + seq: + - id: index + type: u4 + doc: | + Identifies the desired page number. + instances: + body: + doc: | + When referenced, loads the specified page and parses its + contents appropriately for the type of data it contains. + io: _root._io + pos: _root.len_page * index + size: _root.len_page + type: page + + page: + doc: | + A table page, consisting of a short header describing the + content of the page and linking to the next page, followed by a + heap in which row data is found. At the end of the page there is + an index which locates all rows present in the heap via their + offsets past the end of the page header. + seq: + - contents: [0, 0, 0, 0] + - id: page_index + doc: Matches the index we used to look up the page, sanity check? + type: u4 + - id: type + type: u4 + enum: page_type + doc: | + Identifies the type of information stored in the rows of this page. + - id: next_page + doc: | + Index of the next page containing this type of rows. Points past + the end of the file if there are no more. + type: page_ref + - type: u4 + doc: | + @flesniak said: "sequence number (0->1: 8->13, 1->2: 22, 2->3: 27)" + - size: 4 + - id: num_rows_small + type: u1 + doc: | + Holds the value used for `num_rows` (see below) unless + `num_rows_large` is larger (but not equal to `0x1fff`). This + seems like some strange mechanism to deal with the fact that + lots of tiny entries, such as are found in the + `playlist_entries` table, are too big to count with a single + byte. But why not just always use `num_rows_large`, then? + - type: u1 + doc: | + @flesniak said: "a bitmask (1st track: 32)" + - type: u1 + doc: | + @flesniak said: "often 0, sometimes larger, esp. for pages + with high real_entry_count (e.g. 12 for 101 entries)" + - id: page_flags + type: u1 + doc: | + @flesniak said: "strange pages: 0x44, 0x64; otherwise seen: 0x24, 0x34" + - id: free_size + type: u2 + doc: | + Unused space (in bytes) in the page heap, excluding the row + index at end of page. + - id: used_size + type: u2 + doc: | + The number of bytes that are in use in the page heap. + - type: u2 + doc: | + @flesniak said: "(0->1: 2)" + - id: num_rows_large + type: u2 + doc: | + Holds the value used for `num_rows` (see below) when that is + too large to fit into `num_rows_small`, and that situation + seems to be indicated when this value is larger than + `num_rows_small`, but not equal to `0x1fff`. This seems like + some strange mechanism to deal with the fact that lots of + tiny entries, such as are found in the `playlist_entries` + table, are too big to count with a single byte. But why not + just always use this value, then? + - type: u2 + doc: | + @flesniak said: "1004 for strange blocks, 0 otherwise" + - type: u2 + doc: | + @flesniak said: "always 0 except 1 for history pages, num + entries for strange pages?" + - id: heap + size-eos: true + if: heap_pos < 0 # never true, but stores pos + instances: + is_data_page: + value: page_flags & 0x40 == 0 + -webide-parse-mode: eager + heap_pos: + value: _io.pos + num_rows: + value: | + (num_rows_large > num_rows_small) and (num_rows_large != 0x1fff) ? num_rows_large : num_rows_small + doc: | + The number of rows on this page (controls the number of row + index entries there are, but some of those may not be marked + as present in the table due to deletion). + -webide-parse-mode: eager + num_groups: + value: '(num_rows - 1) / 16 + 1' + doc: | + The number of row groups that are present in the index. Each + group can hold up to sixteen rows. All but the final one + will hold sixteen rows. + row_groups: + type: 'row_group(_index)' + repeat: expr + repeat-expr: num_groups + doc: | + The actual row groups making up the row index. Each group + can hold up to sixteen rows. Non-data pages do not have + actual rows, and attempting to parse them can crash. + if: is_data_page + + row_group: + doc: | + A group of row indices, which are built backwards from the end + of the page. Holds up to sixteen row offsets, along with a bit + mask that indicates whether each row is actually present in the + table. + params: + - id: group_index + type: u2 + doc: | + Identifies which group is being generated. They build backwards + from the end of the page. + instances: + base: + value: '_root.len_page - (group_index * 0x24)' + doc: | + The starting point of this group of row indices. + row_present_flags: + pos: base - 4 + type: u2 + doc: | + Each bit specifies whether a particular row is present. The + low order bit corresponds to the first row in this index, + whose offset immediately precedes these flag bits. The + second bit corresponds to the row whose offset precedes + that, and so on. + -webide-parse-mode: eager + rows: + type: row_ref(_index) + repeat: expr + repeat-expr: '(group_index < (_parent.num_groups - 1)) ? 16 : ((_parent.num_rows - 1) % 16 + 1)' + doc: | + The row offsets in this group. + + row_ref: + doc: | + An offset which points to a row in the table, whose actual + presence is controlled by one of the bits in + `row_present_flags`. This instance allows the row itself to be + lazily loaded, unless it is not present, in which case there is + no content to be loaded. + params: + - id: row_index + type: u2 + doc: | + Identifies which row within the row index this reference + came from, so the correct flag can be checked for the row + presence and the correct row offset can be found. + instances: + ofs_row: + pos: '_parent.base - (6 + (2 * row_index))' + type: u2 + doc: | + The offset of the start of the row (in bytes past the end of + the page header). + row_base: + value: ofs_row + _parent._parent.heap_pos + doc: | + The location of this row relative to the start of the page. + A variety of pointers (such as all device_sql_string values) + are calculated with respect to this position. + present: + value: '(((_parent.row_present_flags >> row_index) & 1) != 0 ? true : false)' + doc: | + Indicates whether the row index considers this row to be + present in the table. Will be `false` if the row has been + deleted. + -webide-parse-mode: eager + body: + pos: row_base + type: + switch-on: _parent._parent.type + cases: + 'page_type::albums': album_row + 'page_type::artists': artist_row + 'page_type::artwork': artwork_row + 'page_type::colors': color_row + 'page_type::genres': genre_row + 'page_type::keys': key_row + 'page_type::labels': label_row + 'page_type::playlist_tree': playlist_tree_row + 'page_type::playlist_entries': playlist_entry_row + 'page_type::tracks': track_row + if: present + doc: | + The actual content of the row, as long as it is present. + -webide-parse-mode: eager + -webide-representation: '{body.name.body.text}{body.title.body.text} ({body.id})' + + album_row: + doc: | + A row that holds an album name and ID. + seq: + - id: magic + contents: [0x80, 0x00] + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - type: u4 + - id: artist_id + type: u4 + doc: | + Identifies the artist associated with the album. + - id: id + type: u4 + doc: | + The unique identifier by which this album can be requested + and linked from other rows (such as tracks). + - type: u4 + - type: u1 + doc: | + @flesniak says: "alwayx 0x03, maybe an unindexed empty string" + - id: ofs_name + type: u1 + doc: | + The location of the variable-length name string, relative to + the start of this row. + instances: + name: + type: device_sql_string + pos: _parent.row_base + ofs_name + doc: | + The name of this album. + -webide-parse-mode: eager + + artist_row: + doc: | + A row that holds an artist name and ID. + seq: + - id: magic + contents: [0x60, 0x00] + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - id: id + type: u4 + doc: | + The unique identifier by which this artist can be requested + and linked from other rows (such as tracks). + - type: u1 + doc: | + @flesniak says: "alwayx 0x03, maybe an unindexed empty string" + - id: ofs_name + type: u1 + doc: | + The location of the variable-length name string, relative to + the start of this row. + instances: + name: + type: device_sql_string + pos: _parent.row_base + ofs_name + doc: | + The name of this artist. + -webide-parse-mode: eager + + artwork_row: + doc: | + A row that holds the path to an album art image file and the + associated artwork ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this art can be requested + and linked from other rows (such as tracks). + - id: path + type: device_sql_string + doc: | + The variable-length file path string at which the art file + can be found. + -webide-representation: '{path.body.text}' + + color_row: + doc: | + A row that holds a color name and the associated ID. + seq: + - size: 5 + - id: id + type: u2 + doc: | + The unique identifier by which this color can be requested + and linked from other rows (such as tracks). + - type: u1 + - id: name + type: device_sql_string + doc: | + The variable-length string naming the color. + + genre_row: + doc: | + A row that holds a genre name and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this genre can be requested + and linked from other rows (such as tracks). + - id: name + type: device_sql_string + doc: | + The variable-length string naming the genre. + + key_row: + doc: | + A row that holds a musical key and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this key can be requested + and linked from other rows (such as tracks). + - id: id2 + type: u4 + doc: | + Seems to be a second copy of the ID? + - id: name + type: device_sql_string + doc: | + The variable-length string naming the key. + + label_row: + doc: | + A row that holds a label name and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this label can be requested + and linked from other rows (such as tracks). + - id: name + type: device_sql_string + doc: | + The variable-length string naming the label. + + playlist_tree_row: + doc: | + A row that holds a playlist name, ID, indication of whether it + is an ordinary playlist or a folder of other playlists, a link + to its parent folder, and its sort order. + seq: + - id: parent_id + type: u4 + doc: | + The ID of the `playlist_tree_row` in which this one can be + found, or `0` if this playlist exists at the root level. + - size: 4 + - id: sort_order + type: u4 + doc: | + The order in which the entries of this playlist are sorted. + - id: id + type: u4 + doc: | + The unique identifier by which this playlist can be requested + and linked from other rows (such as tracks). + - id: raw_is_folder + type: u4 + doc: | + Has a non-zero value if this is actually a folder rather + than a playlist. + - id: name + type: device_sql_string + doc: | + The variable-length string naming the playlist. + instances: + is_folder: + value: raw_is_folder != 0 + -webide-parse-mode: eager + + playlist_entry_row: + doc: | + A row that associates a track with a position in a playlist. + seq: + - id: entry_index + type: u4 + doc: | + The position within the playlist represented by this entry. + - id: track_id + type: u4 + doc: | + The track found at this position in the playlist. + - id: playlist_id + type: u4 + doc: | + The playlist to which this entry belongs. + + track_row: + doc: | + A row that describes a track that can be played, with many + details about the music, and links to other tables like artists, + albums, keys, etc. + seq: + - id: magic + contents: [0x24, 0x00] + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - id: bitmask + type: u4 + doc: TODO what do the bits mean? + - id: sample_rate + type: u4 + doc: | + Playback sample rate of the audio file. + - id: composer_id + type: u4 + doc: | + References a row in the artist table if the composer is + known. + - id: file_size + type: u4 + doc: | + The length of the audio file, in bytes. + - type: u4 + doc: | + Some ID? Purpose as yet unknown. + - type: u2 + doc: | + From @flesniak: "always 19048?" + - type: u2 + doc: | + From @flesniak: "always 30967?" + - id: artwork_id + type: u4 + doc: | + References a row in the artwork table if there is album art. + - id: key_id + type: u4 + doc: | + References a row in the keys table if the track has a known + main musical key. + - id: original_artist_id + type: u4 + doc: | + References a row in the artwork table if this is a cover + performance and the original artist is known. + - id: label_id + type: u4 + doc: | + References a row in the labels table if the track has a + known record label. + - id: remixer_id + type: u4 + doc: | + References a row in the artists table if the track has a + known remixer. + - id: bitrate + type: u4 + doc: | + Playback bit rate of the audio file. + - id: track_number + type: u4 + doc: | + The position of the track within an album. + - id: tempo + type: u4 + doc: | + The tempo at the start of the track in beats per minute, + multiplied by 100. + - id: genre_id + type: u4 + doc: | + References a row in the genres table if the track has a + known musical genre. + - id: album_id + type: u4 + doc: | + References a row in the albums table if the track has a + known album. + - id: artist_id + type: u4 + doc: | + References a row in the artists table if the track has a + known performer. + - id: id + type: u4 + doc: | + The id by which this track can be looked up; players will + report this value in their status packets when they are + playing the track. + - id: disc_number + type: u2 + doc: | + The number of the disc on which this track is found, if it + is known to be part of a multi-disc album. + - id: play_count + type: u2 + doc: | + The number of times this track has been played. + - id: year + type: u2 + doc: | + The year in which this track was released. + - id: sample_depth + type: u2 + doc: | + The number of bits per sample of the audio file. + - id: duration + type: u2 + doc: | + The length, in seconds, of the track when played at normal + speed. + - type: u2 + doc: | + From @flesniak: "always 41?" + - id: color_id + type: u1 + doc: | + References a row in the colors table if the track has been + assigned a color. + - id: rating + type: u1 + doc: | + The number of stars to display for the track, 0 to 5. + - type: u2 + doc: | + From @flesniak: "always 1?" + - type: u2 + doc: | + From @flesniak: "alternating 2 or 3" + - id: ofs_strings + type: u2 + repeat: expr + repeat-expr: 21 + doc: | + The location, relative to the start of this row, of a + variety of variable-length strings. + instances: + unknown_string_1: + type: device_sql_string + pos: _parent.row_base + ofs_strings[0] + doc: | + A string of unknown purpose, which has so far only been + empty. + -webide-parse-mode: eager + texter: + type: device_sql_string + pos: _parent.row_base + ofs_strings[1] + doc: | + A string of unknown purpose, which @flesnik named. + -webide-parse-mode: eager + unknown_string_2: + type: device_sql_string + pos: _parent.row_base + ofs_strings[2] + doc: | + A string of unknown purpose; @flesniak said "thought + tracknumber -> wrong!" + unknown_string_3: + type: device_sql_string + pos: _parent.row_base + ofs_strings[3] + doc: | + A string of unknown purpose; @flesniak said "strange + strings, often zero length, sometimes low binary values + 0x01/0x02 as content" + unknown_string_4: + type: device_sql_string + pos: _parent.row_base + ofs_strings[4] + doc: | + A string of unknown purpose; @flesniak said "strange + strings, often zero length, sometimes low binary values + 0x01/0x02 as content" + -webide-parse-mode: eager + message: + type: device_sql_string + pos: _parent.row_base + ofs_strings[5] + doc: | + A string of unknown purpose, which @flesnik named. + -webide-parse-mode: eager + kuvo_public: + type: device_sql_string + pos: _parent.row_base + ofs_strings[6] + doc: | + A string whose value is always either empty or "ON", and + which apparently for some insane reason is used, rather than + a single bit somewhere, to control whether the track + information is visible on Kuvo. + -webide-parse-mode: eager + autoload_hotcues: + type: device_sql_string + pos: _parent.row_base + ofs_strings[7] + doc: | + A string whose value is always either empty or "ON", and + which apparently for some insane reason is used, rather than + a single bit somewhere, to control whether hot-cues are + auto-loaded for the track. + -webide-parse-mode: eager + unknown_string_5: + type: device_sql_string + pos: _parent.row_base + ofs_strings[8] + doc: | + A string of unknown purpose. + -webide-parse-mode: eager + unknown_string_6: + type: device_sql_string + pos: _parent.row_base + ofs_strings[9] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + date_added: + type: device_sql_string + pos: _parent.row_base + ofs_strings[10] + doc: | + A string containing the date this track was added to the collection. + -webide-parse-mode: eager + release_date: + type: device_sql_string + pos: _parent.row_base + ofs_strings[11] + doc: | + A string containing the date this track was released, if known. + -webide-parse-mode: eager + mix_name: + type: device_sql_string + pos: _parent.row_base + ofs_strings[12] + doc: | + A string naming the remix of the track, if known. + -webide-parse-mode: eager + unknown_string_7: + type: device_sql_string + pos: _parent.row_base + ofs_strings[13] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + analyze_path: + type: device_sql_string + pos: _parent.row_base + ofs_strings[14] + doc: | + The file path of the track analysis, which allows rapid + seeking to particular times in variable bit-rate files, + jumping to particular beats, visual waveform previews, and + stores cue points and loops. + -webide-parse-mode: eager + analyze_date: + type: device_sql_string + pos: _parent.row_base + ofs_strings[15] + doc: | + A string containing the date this track was analyzed by rekordbox. + -webide-parse-mode: eager + comment: + type: device_sql_string + pos: _parent.row_base + ofs_strings[16] + doc: | + The comment assigned to the track by the DJ, if any. + -webide-parse-mode: eager + title: + type: device_sql_string + pos: _parent.row_base + ofs_strings[17] + doc: | + The title of the track. + -webide-parse-mode: eager + unknown_string_8: + type: device_sql_string + pos: _parent.row_base + ofs_strings[18] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + filename: + type: device_sql_string + pos: _parent.row_base + ofs_strings[19] + doc: | + The file name of the track audio file. + -webide-parse-mode: eager + file_path: + type: device_sql_string + pos: _parent.row_base + ofs_strings[20] + doc: | + The file path of the track audio file. + -webide-parse-mode: eager + + device_sql_string: + doc: | + A variable length string which can be stored in a variety of + different encodings. + seq: + - id: length_and_kind + type: u1 + doc: | + Mangled length of an ordinary ASCII string if odd, or a flag + indicating another encoding with a longer length value to + follow. + - id: body + type: + switch-on: length_and_kind + cases: + 0x40: device_sql_long_ascii + 0x90: device_sql_long_utf16be + _: device_sql_short_ascii(length_and_kind) + -webide-parse-mode: eager + -webide-representation: '{body.text}' + + device_sql_short_ascii: + doc: | + An ASCII-encoded string up to 127 bytes long. + params: + - id: mangled_length + type: u1 + doc: | + Contains the actual length, incremented, doubled, and + incremented again. Go figure. + seq: + - id: text + type: str + size: length + encoding: ascii + if: '(mangled_length % 2 > 0) and (length >= 0)' # Skip invalid strings + doc: | + The content of the string. + instances: + length: + value: '((mangled_length - 1) / 2) - 1' + doc: | + The un-mangled length of the string, in bytes. + -webide-parse-mode: eager + + device_sql_long_ascii: + doc: | + An ASCII-encoded string preceded by a two-byte length field. + TODO May need to skip a byte after the length! + Have not found any test data. + seq: + - id: length + type: u2 + doc: | + Contains the length of the string in bytes. + - id: text + type: str + size: length + encoding: ascii + doc: | + The content of the string. + + device_sql_long_utf16be: + doc: | + A UTF-16BE-encoded string preceded by a two-byte length field. + seq: + - id: length + type: u2 + doc: | + Contains the length of the string in bytes, including two trailing nulls. + - id: text + type: str + size: length - 4 + encoding: utf-16be + doc: | + The content of the string. + +enums: + page_type: + 0: + id: tracks + doc: | + Holds rows describing tracks, such as their title, artist, + genre, artwork ID, playing time, etc. + 1: + id: genres + doc: | + Holds rows naming musical genres, for reference by tracks and searching. + 2: + id: artists + doc: | + Holds rows naming artists, for reference by tracks and searching. + 3: + id: albums + doc: | + Holds rows naming albums, for reference by tracks and searching. + 4: + id: labels + doc: | + Holds rows naming music labels, for reference by tracks and searching. + 5: + id: keys + doc: | + Holds rows naming musical keys, for reference by tracks and searching. + 6: + id: colors + doc: | + Holds rows naming color labels, for reference by tracks and searching. + 7: + id: playlist_tree + doc: | + Holds rows that describe the hierarchical tree structure of + available playlists and folders grouping them. + 8: + id: playlist_entries + doc: | + Holds rows that enumerate the tracks found in playlists and + the playlists they belong to. + 9: + id: unknown_9 + 10: + id: unknown_10 + 11: + id: unknown_11 + doc: | + The rows all seem to have history file names in them, such as "HISTORY 001". + 12: + id: unknown_12 + 13: + id: artwork + doc: | + Holds rows pointing to album artwork images. + 14: + id: unknown_14 + 15: + id: unknown_15 + 16: + id: columns + doc: | + TODO figure out and explain + 17: + id: unknown_17 + 18: + id: unknown_18 + 19: + id: history + doc: | + Holds rows listing tracks played in performance sessions. diff --git a/media/anlz.ksy b/media/anlz.ksy new file mode 100644 index 000000000..fbe99dc31 --- /dev/null +++ b/media/anlz.ksy @@ -0,0 +1,302 @@ +meta: + id: anlz_file + title: rekordbox track analysis file + application: rekordbox + file-extension: + - dat + - ext + license: EPL-1.0 + endian: be + +doc: | + These files are created by rekordbox when analyzing audio tracks + to facilitate DJ performance. They include waveforms, beat grids + (information about the precise time at which each beat occurs), + time indices to allow efficient seeking to specific positions + inside variable bit-rate audio streams, and lists of memory cues + and loop points. They are used by Pioneer professional DJ + equipment. + + The format has been reverse-engineered to facilitate sophisticated + integrations with light and laser shows, videos, and other musical + instruments, by supporting deep knowledge of what is playing and + what is coming next through monitoring the network communications + of the players. + +doc-ref: https://reverseengineering.stackexchange.com/questions/4311/help-reversing-a-edb-database-file-for-pioneers-rekordbox-software + +seq: + - contents: "PMAI" + - id: len_header + type: u4 + doc: | + The number of bytes of this header section. + - id: len_file + type: u4 + doc: | + The number of bytes in the entire file. + - size: len_header - _io.pos + - id: sections + type: tagged_section + repeat: eos + doc: | + The remainder of the file is a sequence of type-tagged sections, + identified by a four-byte magic sequence. + +types: + tagged_section: + doc: | + A type-tagged file section, identified by a four-byte magic + sequence, with a header specifying its length, and whose payload + is determined by the type tag. + seq: + - id: fourcc + type: u4 + enum: section_tags + doc: | + A tag value indicating what kind of section this is. + - id: len_header + type: u4 + doc: | + The size, in bytes, of the header portion of the tag. + - id: len_tag + type: u4 + doc: | + The size, in bytes, of this entire tag, counting the header. + - id: body + size: len_tag - 12 + type: + switch-on: fourcc + cases: + 'section_tags::cues': cue_tag + 'section_tags::path': path_tag + 'section_tags::beat_grid': beat_grid_tag + 'section_tags::vbr': vbr_tag + 'section_tags::wave_preview': wave_preview_tag + 'section_tags::wave_tiny': wave_preview_tag + 'section_tags::wave_scroll': wave_scroll_tag + 'section_tags::wave_color_preview': wave_color_preview_tag + 'section_tags::wave_color_scroll': wave_color_scroll_tag + -webide-representation: '{fourcc}' + + + beat_grid_tag: + doc: | + Holds a list of all the beats found within the track, recording + their bar position, the time at which they occur, and the tempo + at that point. + seq: + - type: u4 + - type: u4 # @flesniak says this is always 0x80000 + - id: len_beats + type: u4 + doc: | + The number of beat entries which follow. + - id: beats + type: beat_grid_beat + repeat: expr + repeat-expr: len_beats + doc: The entries of the beat grid. + + beat_grid_beat: + doc: | + Describes an individual beat in a beat grid. + seq: + - id: beat_number + type: u2 + doc: | + The position of the beat within its musical bar, where beat 1 + is the down beat. + - id: tempo + type: u2 + doc: | + The tempo at the time of this beat, in beats per minute, + multiplied by 100. + - id: time + type: u4 + doc: | + The time, in milliseconds, at which this beat occurs when + the track is played at normal (100%) pitch. + + cue_tag: + doc: | + Stores either a list of ordinary memory cues and loop points, or + a list of hot cues and loop points. + seq: + - id: type + type: u4 + enum: cue_list_type + doc: | + Identifies whether this tag stors ordinary or hot cues. + - id: len_cues + type: u4 + doc: | + The length of the cue list. + - id: memory_count + type: u4 + doc: | + Unsure what this means. + - id: cues + type: cue_entry + repeat: expr + repeat-expr: len_cues + + cue_entry: + doc: | + A cue list entry. Can either represent a memory cue or a loop. + seq: + - contents: "PCPT" + - id: len_header + type: u4 + - id: len_entry + type: u4 + - id: hot_cue + type: u4 + doc: | + If zero, this is an ordinary memory cue, otherwise this a + hot cue with the specified number. + - id: status + type: u4 + enum: cue_entry_status + doc: | + If zero, this entry should be ignored. + - type: u4 # Seems to always be 0x10000 + - id: order_first + type: u2 + doc: | + @flesniak says: "0xffff for first cue, 0,1,3 for next" + - id: order_last + type: u2 + doc: | + @flesniak says: "1,2,3 for first, second, third cue, 0xffff for last" + - id: type + type: u1 + enum: cue_entry_type + doc: | + Indicates whether this is a memory cue or a loop. + - size: 3 # seems to always be 1000 + - id: time + type: u4 + doc: | + The position, in milliseconds, at which the cue point lies + in the track. + - id: loop_time + type: u4 + doc: | + The position, in milliseconds, at which the player loops + back to the cue time if this is a loop. + - size: 16 + + path_tag: + doc: | + Stores the file path of the audio file to which this analysis + applies. + seq: + - id: len_path + type: u4 + - id: path + type: str + size: len_path - 2 + encoding: utf-16be + if: len_path > 1 + + vbr_tag: + doc: | + Stores an index allowing rapid seeking to particular times + within a variable-bitrate audio file. + seq: + - type: u4 + - id: index + type: u4 + repeat: expr + repeat-expr: 400 + + wave_preview_tag: + doc: | + Stores a waveform preview image suitable for display above + the touch strip for jumping to a track position. + seq: + - id: len_preview + type: u4 + doc: | + The length, in bytes, of the preview data itself. This is + slightly redundant because it can be computed from the + length of the tag. + - type: u4 # This seems to always have the value 0x10000 + - id: data + size: len_preview + doc: | + The actual bytes of the waveform preview. + + wave_scroll_tag: + doc: | + A larger waveform image suitable for scrolling along as a track + plays. + seq: + - type: u4 # Always 1? + - id: len_entries + type: u4 + doc: | + The number of waveform data points, each of which takes one + byte. + - type: u4 # Always 0x960000? + - id: entries + size: len_entries + + wave_color_preview_tag: + doc: | + A larger, colorful waveform preview image suitable for display + above the touch strip for jumping to a track position on newer + high-resolution players. + seq: + - type: u4 + - id: len_entries + type: u4 + doc: | + The number of waveform data points, each of which takes one + byte for each of six channels of information. + - type: u4 + - id: entries + size: len_entries * 6 + + wave_color_scroll_tag: + doc: | + A larger, colorful waveform image suitable for scrolling along + as a track plays on newer high-resolution hardware. Also + contains a higher-resolution blue/white waveform. + seq: + - type: u4 # I have seen the value 2? + - id: len_entries + type: u4 + doc: | + The number of columns of waveform data (this matches the + non-color waveform length), but we do not yet know how to + translate the payload into color columns. + - type: u4 + - id: entries + size-eos: true + +enums: + section_tags: + 0x50434f42: cues # PCOB + 0x50434f32: cues_2 # PCO2 (seen in .EXT) + 0x50505448: path # PPTH + 0x50564252: vbr # PVBR + 0x5051545a: beat_grid # PQTZ + 0x50574156: wave_preview # PWAV + 0x50575632: wave_tiny # PWV2 + 0x50575633: wave_scroll # PWV3 (seen in .EXT) + 0x50575634: wave_color_preview # PWV4 (seen in .EXT) + 0x50575635: wave_color_scroll # PWV5 (seen in .EXT) + + cue_list_type: + 0: memory_cues + 1: hot_cues + + cue_entry_type: + 1: memory_cue + 2: loop + + cue_entry_status: + 0: disabled + 1: enabled From 108c283e5400afd019e894a2b26bad9dfdcf2606 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Tue, 20 Nov 2018 21:11:37 -0600 Subject: [PATCH 02/14] Rename KSY structures, per @KOLANICH --- database/pdb.ksy | 2 +- media/anlz.ksy | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/database/pdb.ksy b/database/pdb.ksy index 110567549..a1fef8371 100644 --- a/database/pdb.ksy +++ b/database/pdb.ksy @@ -1,5 +1,5 @@ meta: - id: pdb_file + id: rekordbox_pdb title: DeviceSQL database export (probably generated by rekordbox) application: rekordbox file-extension: diff --git a/media/anlz.ksy b/media/anlz.ksy index fbe99dc31..c4fd2a63b 100644 --- a/media/anlz.ksy +++ b/media/anlz.ksy @@ -1,5 +1,5 @@ meta: - id: anlz_file + id: rekordbox_anlz title: rekordbox track analysis file application: rekordbox file-extension: From 84a345527619557440e025a6de0aa7cb2c9f65d6 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Tue, 20 Nov 2018 21:17:59 -0600 Subject: [PATCH 03/14] No longer fail if the first four bytes are not zero. Another good suggestion from @KOLANICH --- database/pdb.ksy | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/database/pdb.ksy b/database/pdb.ksy index a1fef8371..0c74d4800 100644 --- a/database/pdb.ksy +++ b/database/pdb.ksy @@ -49,7 +49,10 @@ doc: | doc-ref: https://github.com/Deep-Symmetry/crate-digger/blob/master/doc/Analysis.pdf seq: - - contents: [0, 0, 0, 0] + - type: u4 + doc: | + Unknown purpose, perhaps an unoriginal signature, seems to + always have the value 0. - id: len_page type: u4 doc: | From 70194177aed3bf37f1fef0bdb68432efeed3641b Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Wed, 21 Nov 2018 10:58:28 -0600 Subject: [PATCH 04/14] Rename files to match top-level ids --- database/{dbf.ksy => rekordbox_pdb.ksy} | 0 media/{anlz.ksy => rekordbox_anlz.ksy} | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename database/{dbf.ksy => rekordbox_pdb.ksy} (100%) rename media/{anlz.ksy => rekordbox_anlz.ksy} (100%) diff --git a/database/dbf.ksy b/database/rekordbox_pdb.ksy similarity index 100% rename from database/dbf.ksy rename to database/rekordbox_pdb.ksy diff --git a/media/anlz.ksy b/media/rekordbox_anlz.ksy similarity index 100% rename from media/anlz.ksy rename to media/rekordbox_anlz.ksy From 3c7635dd8c9241b368527ecf582277a9b56bde3c Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Wed, 21 Nov 2018 15:34:32 -0600 Subject: [PATCH 05/14] Abandon use of enum for FourCC until issue #300 is resolved Trying to use an enum causes unavoidable parse errors in Java and Python when new/unknown FourCC values are encountered. See https://github.com/kaitai-io/kaitai_struct/issues/300 --- media/rekordbox_anlz.ksy | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/media/rekordbox_anlz.ksy b/media/rekordbox_anlz.ksy index c4fd2a63b..977774373 100644 --- a/media/rekordbox_anlz.ksy +++ b/media/rekordbox_anlz.ksy @@ -51,8 +51,8 @@ types: is determined by the type tag. seq: - id: fourcc - type: u4 - enum: section_tags + type: s4 + # enum: section_tags Can't use this until enums support default/unmatched value doc: | A tag value indicating what kind of section this is. - id: len_header @@ -68,15 +68,16 @@ types: type: switch-on: fourcc cases: - 'section_tags::cues': cue_tag - 'section_tags::path': path_tag - 'section_tags::beat_grid': beat_grid_tag - 'section_tags::vbr': vbr_tag - 'section_tags::wave_preview': wave_preview_tag - 'section_tags::wave_tiny': wave_preview_tag - 'section_tags::wave_scroll': wave_scroll_tag - 'section_tags::wave_color_preview': wave_color_preview_tag - 'section_tags::wave_color_scroll': wave_color_scroll_tag + 0x50434f42: cue_tag #'section_tags::cues' + 0x50505448: path_tag #'section_tags::path' + 0x5051545a: beat_grid_tag #'section_tags::beat_grid' + 0x50564252: vbr_tag #'section_tags::vbr' + 0x50574156: wave_preview_tag #'section_tags::wave_preview' + 0x50575632: wave_preview_tag #'section_tags::wave_tiny' + 0x50575633: wave_scroll_tag #'section_tags::wave_scroll' + 0x50575634: wave_color_preview_tag #'section_tags::wave_color_preview' + 0x50575635: wave_color_scroll_tag #'section_tags::wave_color_scroll' + _: unknown_tag -webide-representation: '{fourcc}' @@ -276,8 +277,10 @@ types: - id: entries size-eos: true + unknown_tag: {} + enums: - section_tags: + section_tags: # We can't use this enum until KSC supports default/unmatched values 0x50434f42: cues # PCOB 0x50434f32: cues_2 # PCO2 (seen in .EXT) 0x50505448: path # PPTH From f4a44d408f0fb2402c188c09c4f1e2dc65b51f75 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Sun, 25 Nov 2018 12:22:48 -0600 Subject: [PATCH 06/14] Attempt #2 at renaming file, weird! --- database/pdb.ksy | 964 --------------------------------- database/rekordbox_pdb.ksy | 1023 +++++++++++++++++++++++++++++++++--- 2 files changed, 943 insertions(+), 1044 deletions(-) delete mode 100644 database/pdb.ksy diff --git a/database/pdb.ksy b/database/pdb.ksy deleted file mode 100644 index 0c74d4800..000000000 --- a/database/pdb.ksy +++ /dev/null @@ -1,964 +0,0 @@ -meta: - id: rekordbox_pdb - title: DeviceSQL database export (probably generated by rekordbox) - application: rekordbox - file-extension: - - pdb - license: EPL-1.0 - endian: le - -doc: | - This is a relational database format designed to be efficiently used - by very low power devices (there were deployments on 16 bit devices - with 32K of RAM). Today you are most likely to encounter it within - the Pioneer Professional DJ ecosystem, because it is the format that - their rekordbox software uses to write USB and SD media which can be - mounted in DJ controllers and used to play and mix music. - - It has been reverse-engineered to facilitate sophisticated - integrations with light and laser shows, videos, and other musical - instruments, by supporting deep knowledge of what is playing and - what is coming next through monitoring the network communications of - the players. - - The file is divided into fixed-size blocks. The first block has a - header that establishes the block size, and lists the tables - available in the database, identifying their types and the index of - the first of the series of linked pages that make up that table. - - Each table is made up of a series of rows which may be spread across - any number of pages. The pages start with a header describing the - page and linking to the next page. The rest of the page is used as a - heap: rows are scattered around it, and located using an index - structure that builds backwards from the end of the page. Each row - of a given type has a fixed size structure which links to any - variable-sized strings by their offsets within the page. - - As changes are made to the table, some records may become unused, - and there may be gaps within the heap that are too small to be used - by other data. There is a bit map in the row index that identifies - which rows are actually present. Rows that are not present must be - ignored: they do not contain valid (or even necessarily well-formed) - data. - - The majority of the work in reverse-engineering this format was - performed by @henrybetts and @flesniak, for which I am hugely - grateful. @GreyCat helped me learn the intricacies (and best - practices) of Kaitai far faster than I would have managed on my own. - -doc-ref: https://github.com/Deep-Symmetry/crate-digger/blob/master/doc/Analysis.pdf - -seq: - - type: u4 - doc: | - Unknown purpose, perhaps an unoriginal signature, seems to - always have the value 0. - - id: len_page - type: u4 - doc: | - The database page size, in bytes. Pages are referred to by - index, so this size is needed to calculate their offset, and - table pages have a row index structure which is built from the - end of the page backwards, so finding that also requires this - value. - - id: num_tables - type: u4 - doc: | - Determines the number of table entries that are present. Each - table is a linked list of pages containing rows of a particular - type. - - id: next_unused_page - type: u4 - doc: | - @flesinak said: "Not used as any `empty_candidate`, points - past the end of the file." - - type: u4 - - id: sequence - type: u4 - doc: | - @flesniak said: "Always incremented by at least one, - sometimes by two or three." - - contents: [0, 0, 0, 0] - - id: tables - type: table - repeat: expr - repeat-expr: num_tables - doc: | - Describes and links to the tables present in the database. - -types: - table: - doc: | - Each table is a linked list of pages containing rows of a single - type. This header describes the nature of the table and links to - its pages by index. - seq: - - id: type - type: u4 - enum: page_type - doc: | - Identifies the kind of rows that are found in this table. - - id: empty_candidate - type: u4 - - id: first_page - type: page_ref - doc: | - Links to the chain of pages making up that table. The first - page seems to always contain similar garbage patterns and - zero rows, but the next page it links to contains the start - of the meaningful data rows. - - id: last_page - type: page_ref - doc: | - Holds the index of the last page that makes up this table. - When following the linked list of pages of the table, you - either need to stop when you reach this page, or when you - notice that the `next_page` link you followed took you to a - page of a different `type`. - -webide-representation: '{type}' - - page_ref: - doc: | - An index which points to a table page (its offset can be found - by multiplying the index by the `page_len` value in the file - header). This type allows the linked page to be lazy loaded. - seq: - - id: index - type: u4 - doc: | - Identifies the desired page number. - instances: - body: - doc: | - When referenced, loads the specified page and parses its - contents appropriately for the type of data it contains. - io: _root._io - pos: _root.len_page * index - size: _root.len_page - type: page - - page: - doc: | - A table page, consisting of a short header describing the - content of the page and linking to the next page, followed by a - heap in which row data is found. At the end of the page there is - an index which locates all rows present in the heap via their - offsets past the end of the page header. - seq: - - contents: [0, 0, 0, 0] - - id: page_index - doc: Matches the index we used to look up the page, sanity check? - type: u4 - - id: type - type: u4 - enum: page_type - doc: | - Identifies the type of information stored in the rows of this page. - - id: next_page - doc: | - Index of the next page containing this type of rows. Points past - the end of the file if there are no more. - type: page_ref - - type: u4 - doc: | - @flesniak said: "sequence number (0->1: 8->13, 1->2: 22, 2->3: 27)" - - size: 4 - - id: num_rows_small - type: u1 - doc: | - Holds the value used for `num_rows` (see below) unless - `num_rows_large` is larger (but not equal to `0x1fff`). This - seems like some strange mechanism to deal with the fact that - lots of tiny entries, such as are found in the - `playlist_entries` table, are too big to count with a single - byte. But why not just always use `num_rows_large`, then? - - type: u1 - doc: | - @flesniak said: "a bitmask (1st track: 32)" - - type: u1 - doc: | - @flesniak said: "often 0, sometimes larger, esp. for pages - with high real_entry_count (e.g. 12 for 101 entries)" - - id: page_flags - type: u1 - doc: | - @flesniak said: "strange pages: 0x44, 0x64; otherwise seen: 0x24, 0x34" - - id: free_size - type: u2 - doc: | - Unused space (in bytes) in the page heap, excluding the row - index at end of page. - - id: used_size - type: u2 - doc: | - The number of bytes that are in use in the page heap. - - type: u2 - doc: | - @flesniak said: "(0->1: 2)" - - id: num_rows_large - type: u2 - doc: | - Holds the value used for `num_rows` (see below) when that is - too large to fit into `num_rows_small`, and that situation - seems to be indicated when this value is larger than - `num_rows_small`, but not equal to `0x1fff`. This seems like - some strange mechanism to deal with the fact that lots of - tiny entries, such as are found in the `playlist_entries` - table, are too big to count with a single byte. But why not - just always use this value, then? - - type: u2 - doc: | - @flesniak said: "1004 for strange blocks, 0 otherwise" - - type: u2 - doc: | - @flesniak said: "always 0 except 1 for history pages, num - entries for strange pages?" - - id: heap - size-eos: true - if: heap_pos < 0 # never true, but stores pos - instances: - is_data_page: - value: page_flags & 0x40 == 0 - -webide-parse-mode: eager - heap_pos: - value: _io.pos - num_rows: - value: | - (num_rows_large > num_rows_small) and (num_rows_large != 0x1fff) ? num_rows_large : num_rows_small - doc: | - The number of rows on this page (controls the number of row - index entries there are, but some of those may not be marked - as present in the table due to deletion). - -webide-parse-mode: eager - num_groups: - value: '(num_rows - 1) / 16 + 1' - doc: | - The number of row groups that are present in the index. Each - group can hold up to sixteen rows. All but the final one - will hold sixteen rows. - row_groups: - type: 'row_group(_index)' - repeat: expr - repeat-expr: num_groups - doc: | - The actual row groups making up the row index. Each group - can hold up to sixteen rows. Non-data pages do not have - actual rows, and attempting to parse them can crash. - if: is_data_page - - row_group: - doc: | - A group of row indices, which are built backwards from the end - of the page. Holds up to sixteen row offsets, along with a bit - mask that indicates whether each row is actually present in the - table. - params: - - id: group_index - type: u2 - doc: | - Identifies which group is being generated. They build backwards - from the end of the page. - instances: - base: - value: '_root.len_page - (group_index * 0x24)' - doc: | - The starting point of this group of row indices. - row_present_flags: - pos: base - 4 - type: u2 - doc: | - Each bit specifies whether a particular row is present. The - low order bit corresponds to the first row in this index, - whose offset immediately precedes these flag bits. The - second bit corresponds to the row whose offset precedes - that, and so on. - -webide-parse-mode: eager - rows: - type: row_ref(_index) - repeat: expr - repeat-expr: '(group_index < (_parent.num_groups - 1)) ? 16 : ((_parent.num_rows - 1) % 16 + 1)' - doc: | - The row offsets in this group. - - row_ref: - doc: | - An offset which points to a row in the table, whose actual - presence is controlled by one of the bits in - `row_present_flags`. This instance allows the row itself to be - lazily loaded, unless it is not present, in which case there is - no content to be loaded. - params: - - id: row_index - type: u2 - doc: | - Identifies which row within the row index this reference - came from, so the correct flag can be checked for the row - presence and the correct row offset can be found. - instances: - ofs_row: - pos: '_parent.base - (6 + (2 * row_index))' - type: u2 - doc: | - The offset of the start of the row (in bytes past the end of - the page header). - row_base: - value: ofs_row + _parent._parent.heap_pos - doc: | - The location of this row relative to the start of the page. - A variety of pointers (such as all device_sql_string values) - are calculated with respect to this position. - present: - value: '(((_parent.row_present_flags >> row_index) & 1) != 0 ? true : false)' - doc: | - Indicates whether the row index considers this row to be - present in the table. Will be `false` if the row has been - deleted. - -webide-parse-mode: eager - body: - pos: row_base - type: - switch-on: _parent._parent.type - cases: - 'page_type::albums': album_row - 'page_type::artists': artist_row - 'page_type::artwork': artwork_row - 'page_type::colors': color_row - 'page_type::genres': genre_row - 'page_type::keys': key_row - 'page_type::labels': label_row - 'page_type::playlist_tree': playlist_tree_row - 'page_type::playlist_entries': playlist_entry_row - 'page_type::tracks': track_row - if: present - doc: | - The actual content of the row, as long as it is present. - -webide-parse-mode: eager - -webide-representation: '{body.name.body.text}{body.title.body.text} ({body.id})' - - album_row: - doc: | - A row that holds an album name and ID. - seq: - - id: magic - contents: [0x80, 0x00] - - id: index_shift - type: u2 - doc: TODO name from @flesniak, but what does it mean? - - type: u4 - - id: artist_id - type: u4 - doc: | - Identifies the artist associated with the album. - - id: id - type: u4 - doc: | - The unique identifier by which this album can be requested - and linked from other rows (such as tracks). - - type: u4 - - type: u1 - doc: | - @flesniak says: "alwayx 0x03, maybe an unindexed empty string" - - id: ofs_name - type: u1 - doc: | - The location of the variable-length name string, relative to - the start of this row. - instances: - name: - type: device_sql_string - pos: _parent.row_base + ofs_name - doc: | - The name of this album. - -webide-parse-mode: eager - - artist_row: - doc: | - A row that holds an artist name and ID. - seq: - - id: magic - contents: [0x60, 0x00] - - id: index_shift - type: u2 - doc: TODO name from @flesniak, but what does it mean? - - id: id - type: u4 - doc: | - The unique identifier by which this artist can be requested - and linked from other rows (such as tracks). - - type: u1 - doc: | - @flesniak says: "alwayx 0x03, maybe an unindexed empty string" - - id: ofs_name - type: u1 - doc: | - The location of the variable-length name string, relative to - the start of this row. - instances: - name: - type: device_sql_string - pos: _parent.row_base + ofs_name - doc: | - The name of this artist. - -webide-parse-mode: eager - - artwork_row: - doc: | - A row that holds the path to an album art image file and the - associated artwork ID. - seq: - - id: id - type: u4 - doc: | - The unique identifier by which this art can be requested - and linked from other rows (such as tracks). - - id: path - type: device_sql_string - doc: | - The variable-length file path string at which the art file - can be found. - -webide-representation: '{path.body.text}' - - color_row: - doc: | - A row that holds a color name and the associated ID. - seq: - - size: 5 - - id: id - type: u2 - doc: | - The unique identifier by which this color can be requested - and linked from other rows (such as tracks). - - type: u1 - - id: name - type: device_sql_string - doc: | - The variable-length string naming the color. - - genre_row: - doc: | - A row that holds a genre name and the associated ID. - seq: - - id: id - type: u4 - doc: | - The unique identifier by which this genre can be requested - and linked from other rows (such as tracks). - - id: name - type: device_sql_string - doc: | - The variable-length string naming the genre. - - key_row: - doc: | - A row that holds a musical key and the associated ID. - seq: - - id: id - type: u4 - doc: | - The unique identifier by which this key can be requested - and linked from other rows (such as tracks). - - id: id2 - type: u4 - doc: | - Seems to be a second copy of the ID? - - id: name - type: device_sql_string - doc: | - The variable-length string naming the key. - - label_row: - doc: | - A row that holds a label name and the associated ID. - seq: - - id: id - type: u4 - doc: | - The unique identifier by which this label can be requested - and linked from other rows (such as tracks). - - id: name - type: device_sql_string - doc: | - The variable-length string naming the label. - - playlist_tree_row: - doc: | - A row that holds a playlist name, ID, indication of whether it - is an ordinary playlist or a folder of other playlists, a link - to its parent folder, and its sort order. - seq: - - id: parent_id - type: u4 - doc: | - The ID of the `playlist_tree_row` in which this one can be - found, or `0` if this playlist exists at the root level. - - size: 4 - - id: sort_order - type: u4 - doc: | - The order in which the entries of this playlist are sorted. - - id: id - type: u4 - doc: | - The unique identifier by which this playlist can be requested - and linked from other rows (such as tracks). - - id: raw_is_folder - type: u4 - doc: | - Has a non-zero value if this is actually a folder rather - than a playlist. - - id: name - type: device_sql_string - doc: | - The variable-length string naming the playlist. - instances: - is_folder: - value: raw_is_folder != 0 - -webide-parse-mode: eager - - playlist_entry_row: - doc: | - A row that associates a track with a position in a playlist. - seq: - - id: entry_index - type: u4 - doc: | - The position within the playlist represented by this entry. - - id: track_id - type: u4 - doc: | - The track found at this position in the playlist. - - id: playlist_id - type: u4 - doc: | - The playlist to which this entry belongs. - - track_row: - doc: | - A row that describes a track that can be played, with many - details about the music, and links to other tables like artists, - albums, keys, etc. - seq: - - id: magic - contents: [0x24, 0x00] - - id: index_shift - type: u2 - doc: TODO name from @flesniak, but what does it mean? - - id: bitmask - type: u4 - doc: TODO what do the bits mean? - - id: sample_rate - type: u4 - doc: | - Playback sample rate of the audio file. - - id: composer_id - type: u4 - doc: | - References a row in the artist table if the composer is - known. - - id: file_size - type: u4 - doc: | - The length of the audio file, in bytes. - - type: u4 - doc: | - Some ID? Purpose as yet unknown. - - type: u2 - doc: | - From @flesniak: "always 19048?" - - type: u2 - doc: | - From @flesniak: "always 30967?" - - id: artwork_id - type: u4 - doc: | - References a row in the artwork table if there is album art. - - id: key_id - type: u4 - doc: | - References a row in the keys table if the track has a known - main musical key. - - id: original_artist_id - type: u4 - doc: | - References a row in the artwork table if this is a cover - performance and the original artist is known. - - id: label_id - type: u4 - doc: | - References a row in the labels table if the track has a - known record label. - - id: remixer_id - type: u4 - doc: | - References a row in the artists table if the track has a - known remixer. - - id: bitrate - type: u4 - doc: | - Playback bit rate of the audio file. - - id: track_number - type: u4 - doc: | - The position of the track within an album. - - id: tempo - type: u4 - doc: | - The tempo at the start of the track in beats per minute, - multiplied by 100. - - id: genre_id - type: u4 - doc: | - References a row in the genres table if the track has a - known musical genre. - - id: album_id - type: u4 - doc: | - References a row in the albums table if the track has a - known album. - - id: artist_id - type: u4 - doc: | - References a row in the artists table if the track has a - known performer. - - id: id - type: u4 - doc: | - The id by which this track can be looked up; players will - report this value in their status packets when they are - playing the track. - - id: disc_number - type: u2 - doc: | - The number of the disc on which this track is found, if it - is known to be part of a multi-disc album. - - id: play_count - type: u2 - doc: | - The number of times this track has been played. - - id: year - type: u2 - doc: | - The year in which this track was released. - - id: sample_depth - type: u2 - doc: | - The number of bits per sample of the audio file. - - id: duration - type: u2 - doc: | - The length, in seconds, of the track when played at normal - speed. - - type: u2 - doc: | - From @flesniak: "always 41?" - - id: color_id - type: u1 - doc: | - References a row in the colors table if the track has been - assigned a color. - - id: rating - type: u1 - doc: | - The number of stars to display for the track, 0 to 5. - - type: u2 - doc: | - From @flesniak: "always 1?" - - type: u2 - doc: | - From @flesniak: "alternating 2 or 3" - - id: ofs_strings - type: u2 - repeat: expr - repeat-expr: 21 - doc: | - The location, relative to the start of this row, of a - variety of variable-length strings. - instances: - unknown_string_1: - type: device_sql_string - pos: _parent.row_base + ofs_strings[0] - doc: | - A string of unknown purpose, which has so far only been - empty. - -webide-parse-mode: eager - texter: - type: device_sql_string - pos: _parent.row_base + ofs_strings[1] - doc: | - A string of unknown purpose, which @flesnik named. - -webide-parse-mode: eager - unknown_string_2: - type: device_sql_string - pos: _parent.row_base + ofs_strings[2] - doc: | - A string of unknown purpose; @flesniak said "thought - tracknumber -> wrong!" - unknown_string_3: - type: device_sql_string - pos: _parent.row_base + ofs_strings[3] - doc: | - A string of unknown purpose; @flesniak said "strange - strings, often zero length, sometimes low binary values - 0x01/0x02 as content" - unknown_string_4: - type: device_sql_string - pos: _parent.row_base + ofs_strings[4] - doc: | - A string of unknown purpose; @flesniak said "strange - strings, often zero length, sometimes low binary values - 0x01/0x02 as content" - -webide-parse-mode: eager - message: - type: device_sql_string - pos: _parent.row_base + ofs_strings[5] - doc: | - A string of unknown purpose, which @flesnik named. - -webide-parse-mode: eager - kuvo_public: - type: device_sql_string - pos: _parent.row_base + ofs_strings[6] - doc: | - A string whose value is always either empty or "ON", and - which apparently for some insane reason is used, rather than - a single bit somewhere, to control whether the track - information is visible on Kuvo. - -webide-parse-mode: eager - autoload_hotcues: - type: device_sql_string - pos: _parent.row_base + ofs_strings[7] - doc: | - A string whose value is always either empty or "ON", and - which apparently for some insane reason is used, rather than - a single bit somewhere, to control whether hot-cues are - auto-loaded for the track. - -webide-parse-mode: eager - unknown_string_5: - type: device_sql_string - pos: _parent.row_base + ofs_strings[8] - doc: | - A string of unknown purpose. - -webide-parse-mode: eager - unknown_string_6: - type: device_sql_string - pos: _parent.row_base + ofs_strings[9] - doc: | - A string of unknown purpose, usually empty. - -webide-parse-mode: eager - date_added: - type: device_sql_string - pos: _parent.row_base + ofs_strings[10] - doc: | - A string containing the date this track was added to the collection. - -webide-parse-mode: eager - release_date: - type: device_sql_string - pos: _parent.row_base + ofs_strings[11] - doc: | - A string containing the date this track was released, if known. - -webide-parse-mode: eager - mix_name: - type: device_sql_string - pos: _parent.row_base + ofs_strings[12] - doc: | - A string naming the remix of the track, if known. - -webide-parse-mode: eager - unknown_string_7: - type: device_sql_string - pos: _parent.row_base + ofs_strings[13] - doc: | - A string of unknown purpose, usually empty. - -webide-parse-mode: eager - analyze_path: - type: device_sql_string - pos: _parent.row_base + ofs_strings[14] - doc: | - The file path of the track analysis, which allows rapid - seeking to particular times in variable bit-rate files, - jumping to particular beats, visual waveform previews, and - stores cue points and loops. - -webide-parse-mode: eager - analyze_date: - type: device_sql_string - pos: _parent.row_base + ofs_strings[15] - doc: | - A string containing the date this track was analyzed by rekordbox. - -webide-parse-mode: eager - comment: - type: device_sql_string - pos: _parent.row_base + ofs_strings[16] - doc: | - The comment assigned to the track by the DJ, if any. - -webide-parse-mode: eager - title: - type: device_sql_string - pos: _parent.row_base + ofs_strings[17] - doc: | - The title of the track. - -webide-parse-mode: eager - unknown_string_8: - type: device_sql_string - pos: _parent.row_base + ofs_strings[18] - doc: | - A string of unknown purpose, usually empty. - -webide-parse-mode: eager - filename: - type: device_sql_string - pos: _parent.row_base + ofs_strings[19] - doc: | - The file name of the track audio file. - -webide-parse-mode: eager - file_path: - type: device_sql_string - pos: _parent.row_base + ofs_strings[20] - doc: | - The file path of the track audio file. - -webide-parse-mode: eager - - device_sql_string: - doc: | - A variable length string which can be stored in a variety of - different encodings. - seq: - - id: length_and_kind - type: u1 - doc: | - Mangled length of an ordinary ASCII string if odd, or a flag - indicating another encoding with a longer length value to - follow. - - id: body - type: - switch-on: length_and_kind - cases: - 0x40: device_sql_long_ascii - 0x90: device_sql_long_utf16be - _: device_sql_short_ascii(length_and_kind) - -webide-parse-mode: eager - -webide-representation: '{body.text}' - - device_sql_short_ascii: - doc: | - An ASCII-encoded string up to 127 bytes long. - params: - - id: mangled_length - type: u1 - doc: | - Contains the actual length, incremented, doubled, and - incremented again. Go figure. - seq: - - id: text - type: str - size: length - encoding: ascii - if: '(mangled_length % 2 > 0) and (length >= 0)' # Skip invalid strings - doc: | - The content of the string. - instances: - length: - value: '((mangled_length - 1) / 2) - 1' - doc: | - The un-mangled length of the string, in bytes. - -webide-parse-mode: eager - - device_sql_long_ascii: - doc: | - An ASCII-encoded string preceded by a two-byte length field. - TODO May need to skip a byte after the length! - Have not found any test data. - seq: - - id: length - type: u2 - doc: | - Contains the length of the string in bytes. - - id: text - type: str - size: length - encoding: ascii - doc: | - The content of the string. - - device_sql_long_utf16be: - doc: | - A UTF-16BE-encoded string preceded by a two-byte length field. - seq: - - id: length - type: u2 - doc: | - Contains the length of the string in bytes, including two trailing nulls. - - id: text - type: str - size: length - 4 - encoding: utf-16be - doc: | - The content of the string. - -enums: - page_type: - 0: - id: tracks - doc: | - Holds rows describing tracks, such as their title, artist, - genre, artwork ID, playing time, etc. - 1: - id: genres - doc: | - Holds rows naming musical genres, for reference by tracks and searching. - 2: - id: artists - doc: | - Holds rows naming artists, for reference by tracks and searching. - 3: - id: albums - doc: | - Holds rows naming albums, for reference by tracks and searching. - 4: - id: labels - doc: | - Holds rows naming music labels, for reference by tracks and searching. - 5: - id: keys - doc: | - Holds rows naming musical keys, for reference by tracks and searching. - 6: - id: colors - doc: | - Holds rows naming color labels, for reference by tracks and searching. - 7: - id: playlist_tree - doc: | - Holds rows that describe the hierarchical tree structure of - available playlists and folders grouping them. - 8: - id: playlist_entries - doc: | - Holds rows that enumerate the tracks found in playlists and - the playlists they belong to. - 9: - id: unknown_9 - 10: - id: unknown_10 - 11: - id: unknown_11 - doc: | - The rows all seem to have history file names in them, such as "HISTORY 001". - 12: - id: unknown_12 - 13: - id: artwork - doc: | - Holds rows pointing to album artwork images. - 14: - id: unknown_14 - 15: - id: unknown_15 - 16: - id: columns - doc: | - TODO figure out and explain - 17: - id: unknown_17 - 18: - id: unknown_18 - 19: - id: history - doc: | - Holds rows listing tracks played in performance sessions. diff --git a/database/rekordbox_pdb.ksy b/database/rekordbox_pdb.ksy index 9f4543212..0c74d4800 100644 --- a/database/rekordbox_pdb.ksy +++ b/database/rekordbox_pdb.ksy @@ -1,101 +1,964 @@ meta: - id: dbf - file-extension: dbf - application: dBASE - license: CC0-1.0 + id: rekordbox_pdb + title: DeviceSQL database export (probably generated by rekordbox) + application: rekordbox + file-extension: + - pdb + license: EPL-1.0 endian: le + +doc: | + This is a relational database format designed to be efficiently used + by very low power devices (there were deployments on 16 bit devices + with 32K of RAM). Today you are most likely to encounter it within + the Pioneer Professional DJ ecosystem, because it is the format that + their rekordbox software uses to write USB and SD media which can be + mounted in DJ controllers and used to play and mix music. + + It has been reverse-engineered to facilitate sophisticated + integrations with light and laser shows, videos, and other musical + instruments, by supporting deep knowledge of what is playing and + what is coming next through monitoring the network communications of + the players. + + The file is divided into fixed-size blocks. The first block has a + header that establishes the block size, and lists the tables + available in the database, identifying their types and the index of + the first of the series of linked pages that make up that table. + + Each table is made up of a series of rows which may be spread across + any number of pages. The pages start with a header describing the + page and linking to the next page. The rest of the page is used as a + heap: rows are scattered around it, and located using an index + structure that builds backwards from the end of the page. Each row + of a given type has a fixed size structure which links to any + variable-sized strings by their offsets within the page. + + As changes are made to the table, some records may become unused, + and there may be gaps within the heap that are too small to be used + by other data. There is a bit map in the row index that identifies + which rows are actually present. Rows that are not present must be + ignored: they do not contain valid (or even necessarily well-formed) + data. + + The majority of the work in reverse-engineering this format was + performed by @henrybetts and @flesniak, for which I am hugely + grateful. @GreyCat helped me learn the intricacies (and best + practices) of Kaitai far faster than I would have managed on my own. + +doc-ref: https://github.com/Deep-Symmetry/crate-digger/blob/master/doc/Analysis.pdf + seq: - - id: header1 - type: header1 - - id: header2 - size: header1.len_header - 12 - type: header2 - - id: records - size: header1.len_record + - type: u4 + doc: | + Unknown purpose, perhaps an unoriginal signature, seems to + always have the value 0. + - id: len_page + type: u4 + doc: | + The database page size, in bytes. Pages are referred to by + index, so this size is needed to calculate their offset, and + table pages have a row index structure which is built from the + end of the page backwards, so finding that also requires this + value. + - id: num_tables + type: u4 + doc: | + Determines the number of table entries that are present. Each + table is a linked list of pages containing rows of a particular + type. + - id: next_unused_page + type: u4 + doc: | + @flesinak said: "Not used as any `empty_candidate`, points + past the end of the file." + - type: u4 + - id: sequence + type: u4 + doc: | + @flesniak said: "Always incremented by at least one, + sometimes by two or three." + - contents: [0, 0, 0, 0] + - id: tables + type: table repeat: expr - repeat-expr: header1.num_records + repeat-expr: num_tables + doc: | + Describes and links to the tables present in the database. + types: - header1: - doc-ref: http://www.dbase.com/Knowledgebase/INT/db7_file_fmt.htm - section 1.1 + table: + doc: | + Each table is a linked list of pages containing rows of a single + type. This header describes the nature of the table and links to + its pages by index. seq: - - id: version - type: u1 - - id: last_update_y - type: u1 - - id: last_update_m + - id: type + type: u4 + enum: page_type + doc: | + Identifies the kind of rows that are found in this table. + - id: empty_candidate + type: u4 + - id: first_page + type: page_ref + doc: | + Links to the chain of pages making up that table. The first + page seems to always contain similar garbage patterns and + zero rows, but the next page it links to contains the start + of the meaningful data rows. + - id: last_page + type: page_ref + doc: | + Holds the index of the last page that makes up this table. + When following the linked list of pages of the table, you + either need to stop when you reach this page, or when you + notice that the `next_page` link you followed took you to a + page of a different `type`. + -webide-representation: '{type}' + + page_ref: + doc: | + An index which points to a table page (its offset can be found + by multiplying the index by the `page_len` value in the file + header). This type allows the linked page to be lazy loaded. + seq: + - id: index + type: u4 + doc: | + Identifies the desired page number. + instances: + body: + doc: | + When referenced, loads the specified page and parses its + contents appropriately for the type of data it contains. + io: _root._io + pos: _root.len_page * index + size: _root.len_page + type: page + + page: + doc: | + A table page, consisting of a short header describing the + content of the page and linking to the next page, followed by a + heap in which row data is found. At the end of the page there is + an index which locates all rows present in the heap via their + offsets past the end of the page header. + seq: + - contents: [0, 0, 0, 0] + - id: page_index + doc: Matches the index we used to look up the page, sanity check? + type: u4 + - id: type + type: u4 + enum: page_type + doc: | + Identifies the type of information stored in the rows of this page. + - id: next_page + doc: | + Index of the next page containing this type of rows. Points past + the end of the file if there are no more. + type: page_ref + - type: u4 + doc: | + @flesniak said: "sequence number (0->1: 8->13, 1->2: 22, 2->3: 27)" + - size: 4 + - id: num_rows_small type: u1 - - id: last_update_d + doc: | + Holds the value used for `num_rows` (see below) unless + `num_rows_large` is larger (but not equal to `0x1fff`). This + seems like some strange mechanism to deal with the fact that + lots of tiny entries, such as are found in the + `playlist_entries` table, are too big to count with a single + byte. But why not just always use `num_rows_large`, then? + - type: u1 + doc: | + @flesniak said: "a bitmask (1st track: 32)" + - type: u1 + doc: | + @flesniak said: "often 0, sometimes larger, esp. for pages + with high real_entry_count (e.g. 12 for 101 entries)" + - id: page_flags type: u1 - - id: num_records - type: u4 - - id: len_header + doc: | + @flesniak said: "strange pages: 0x44, 0x64; otherwise seen: 0x24, 0x34" + - id: free_size + type: u2 + doc: | + Unused space (in bytes) in the page heap, excluding the row + index at end of page. + - id: used_size type: u2 - - id: len_record + doc: | + The number of bytes that are in use in the page heap. + - type: u2 + doc: | + @flesniak said: "(0->1: 2)" + - id: num_rows_large type: u2 + doc: | + Holds the value used for `num_rows` (see below) when that is + too large to fit into `num_rows_small`, and that situation + seems to be indicated when this value is larger than + `num_rows_small`, but not equal to `0x1fff`. This seems like + some strange mechanism to deal with the fact that lots of + tiny entries, such as are found in the `playlist_entries` + table, are too big to count with a single byte. But why not + just always use this value, then? + - type: u2 + doc: | + @flesniak said: "1004 for strange blocks, 0 otherwise" + - type: u2 + doc: | + @flesniak said: "always 0 except 1 for history pages, num + entries for strange pages?" + - id: heap + size-eos: true + if: heap_pos < 0 # never true, but stores pos instances: - dbase_level: - value: 'version & 0b111' - header2: - seq: - - id: header_dbase_3 - if: _root.header1.dbase_level == 3 - type: header_dbase_3 - - id: header_dbase_7 - if: _root.header1.dbase_level == 7 - type: header_dbase_7 - - id: fields - type: field + is_data_page: + value: page_flags & 0x40 == 0 + -webide-parse-mode: eager + heap_pos: + value: _io.pos + num_rows: + value: | + (num_rows_large > num_rows_small) and (num_rows_large != 0x1fff) ? num_rows_large : num_rows_small + doc: | + The number of rows on this page (controls the number of row + index entries there are, but some of those may not be marked + as present in the table due to deletion). + -webide-parse-mode: eager + num_groups: + value: '(num_rows - 1) / 16 + 1' + doc: | + The number of row groups that are present in the index. Each + group can hold up to sixteen rows. All but the final one + will hold sixteen rows. + row_groups: + type: 'row_group(_index)' repeat: expr - repeat-expr: 11 - header_dbase_3: - seq: - - id: reserved1 - size: 3 - - id: reserved2 - size: 13 - - id: reserved3 - size: 4 - header_dbase_7: - seq: - - id: reserved1 - contents: [0, 0] - - id: has_incomplete_transaction - type: u1 - - id: dbase_iv_encryption - type: u1 - - id: reserved2 - size: 12 - - id: production_mdx + repeat-expr: num_groups + doc: | + The actual row groups making up the row index. Each group + can hold up to sixteen rows. Non-data pages do not have + actual rows, and attempting to parse them can crash. + if: is_data_page + + row_group: + doc: | + A group of row indices, which are built backwards from the end + of the page. Holds up to sixteen row offsets, along with a bit + mask that indicates whether each row is actually present in the + table. + params: + - id: group_index + type: u2 + doc: | + Identifies which group is being generated. They build backwards + from the end of the page. + instances: + base: + value: '_root.len_page - (group_index * 0x24)' + doc: | + The starting point of this group of row indices. + row_present_flags: + pos: base - 4 + type: u2 + doc: | + Each bit specifies whether a particular row is present. The + low order bit corresponds to the first row in this index, + whose offset immediately precedes these flag bits. The + second bit corresponds to the row whose offset precedes + that, and so on. + -webide-parse-mode: eager + rows: + type: row_ref(_index) + repeat: expr + repeat-expr: '(group_index < (_parent.num_groups - 1)) ? 16 : ((_parent.num_rows - 1) % 16 + 1)' + doc: | + The row offsets in this group. + + row_ref: + doc: | + An offset which points to a row in the table, whose actual + presence is controlled by one of the bits in + `row_present_flags`. This instance allows the row itself to be + lazily loaded, unless it is not present, in which case there is + no content to be loaded. + params: + - id: row_index + type: u2 + doc: | + Identifies which row within the row index this reference + came from, so the correct flag can be checked for the row + presence and the correct row offset can be found. + instances: + ofs_row: + pos: '_parent.base - (6 + (2 * row_index))' + type: u2 + doc: | + The offset of the start of the row (in bytes past the end of + the page header). + row_base: + value: ofs_row + _parent._parent.heap_pos + doc: | + The location of this row relative to the start of the page. + A variety of pointers (such as all device_sql_string values) + are calculated with respect to this position. + present: + value: '(((_parent.row_present_flags >> row_index) & 1) != 0 ? true : false)' + doc: | + Indicates whether the row index considers this row to be + present in the table. Will be `false` if the row has been + deleted. + -webide-parse-mode: eager + body: + pos: row_base + type: + switch-on: _parent._parent.type + cases: + 'page_type::albums': album_row + 'page_type::artists': artist_row + 'page_type::artwork': artwork_row + 'page_type::colors': color_row + 'page_type::genres': genre_row + 'page_type::keys': key_row + 'page_type::labels': label_row + 'page_type::playlist_tree': playlist_tree_row + 'page_type::playlist_entries': playlist_entry_row + 'page_type::tracks': track_row + if: present + doc: | + The actual content of the row, as long as it is present. + -webide-parse-mode: eager + -webide-representation: '{body.name.body.text}{body.title.body.text} ({body.id})' + + album_row: + doc: | + A row that holds an album name and ID. + seq: + - id: magic + contents: [0x80, 0x00] + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - type: u4 + - id: artist_id + type: u4 + doc: | + Identifies the artist associated with the album. + - id: id + type: u4 + doc: | + The unique identifier by which this album can be requested + and linked from other rows (such as tracks). + - type: u4 + - type: u1 + doc: | + @flesniak says: "alwayx 0x03, maybe an unindexed empty string" + - id: ofs_name type: u1 - - id: language_driver_id + doc: | + The location of the variable-length name string, relative to + the start of this row. + instances: + name: + type: device_sql_string + pos: _parent.row_base + ofs_name + doc: | + The name of this album. + -webide-parse-mode: eager + + artist_row: + doc: | + A row that holds an artist name and ID. + seq: + - id: magic + contents: [0x60, 0x00] + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - id: id + type: u4 + doc: | + The unique identifier by which this artist can be requested + and linked from other rows (such as tracks). + - type: u1 + doc: | + @flesniak says: "alwayx 0x03, maybe an unindexed empty string" + - id: ofs_name type: u1 - - id: reserved3 - contents: [0, 0] - - id: language_driver_name - size: 32 - - id: reserved4 - size: 4 - field: + doc: | + The location of the variable-length name string, relative to + the start of this row. + instances: + name: + type: device_sql_string + pos: _parent.row_base + ofs_name + doc: | + The name of this artist. + -webide-parse-mode: eager + + artwork_row: + doc: | + A row that holds the path to an album art image file and the + associated artwork ID. seq: + - id: id + type: u4 + doc: | + The unique identifier by which this art can be requested + and linked from other rows (such as tracks). + - id: path + type: device_sql_string + doc: | + The variable-length file path string at which the art file + can be found. + -webide-representation: '{path.body.text}' + + color_row: + doc: | + A row that holds a color name and the associated ID. + seq: + - size: 5 + - id: id + type: u2 + doc: | + The unique identifier by which this color can be requested + and linked from other rows (such as tracks). + - type: u1 - id: name - type: str - encoding: ASCII - size: 11 - - id: datatype - type: u1 - - id: data_address + type: device_sql_string + doc: | + The variable-length string naming the color. + + genre_row: + doc: | + A row that holds a genre name and the associated ID. + seq: + - id: id type: u4 - - id: length + doc: | + The unique identifier by which this genre can be requested + and linked from other rows (such as tracks). + - id: name + type: device_sql_string + doc: | + The variable-length string naming the genre. + + key_row: + doc: | + A row that holds a musical key and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this key can be requested + and linked from other rows (such as tracks). + - id: id2 + type: u4 + doc: | + Seems to be a second copy of the ID? + - id: name + type: device_sql_string + doc: | + The variable-length string naming the key. + + label_row: + doc: | + A row that holds a label name and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this label can be requested + and linked from other rows (such as tracks). + - id: name + type: device_sql_string + doc: | + The variable-length string naming the label. + + playlist_tree_row: + doc: | + A row that holds a playlist name, ID, indication of whether it + is an ordinary playlist or a folder of other playlists, a link + to its parent folder, and its sort order. + seq: + - id: parent_id + type: u4 + doc: | + The ID of the `playlist_tree_row` in which this one can be + found, or `0` if this playlist exists at the root level. + - size: 4 + - id: sort_order + type: u4 + doc: | + The order in which the entries of this playlist are sorted. + - id: id + type: u4 + doc: | + The unique identifier by which this playlist can be requested + and linked from other rows (such as tracks). + - id: raw_is_folder + type: u4 + doc: | + Has a non-zero value if this is actually a folder rather + than a playlist. + - id: name + type: device_sql_string + doc: | + The variable-length string naming the playlist. + instances: + is_folder: + value: raw_is_folder != 0 + -webide-parse-mode: eager + + playlist_entry_row: + doc: | + A row that associates a track with a position in a playlist. + seq: + - id: entry_index + type: u4 + doc: | + The position within the playlist represented by this entry. + - id: track_id + type: u4 + doc: | + The track found at this position in the playlist. + - id: playlist_id + type: u4 + doc: | + The playlist to which this entry belongs. + + track_row: + doc: | + A row that describes a track that can be played, with many + details about the music, and links to other tables like artists, + albums, keys, etc. + seq: + - id: magic + contents: [0x24, 0x00] + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - id: bitmask + type: u4 + doc: TODO what do the bits mean? + - id: sample_rate + type: u4 + doc: | + Playback sample rate of the audio file. + - id: composer_id + type: u4 + doc: | + References a row in the artist table if the composer is + known. + - id: file_size + type: u4 + doc: | + The length of the audio file, in bytes. + - type: u4 + doc: | + Some ID? Purpose as yet unknown. + - type: u2 + doc: | + From @flesniak: "always 19048?" + - type: u2 + doc: | + From @flesniak: "always 30967?" + - id: artwork_id + type: u4 + doc: | + References a row in the artwork table if there is album art. + - id: key_id + type: u4 + doc: | + References a row in the keys table if the track has a known + main musical key. + - id: original_artist_id + type: u4 + doc: | + References a row in the artwork table if this is a cover + performance and the original artist is known. + - id: label_id + type: u4 + doc: | + References a row in the labels table if the track has a + known record label. + - id: remixer_id + type: u4 + doc: | + References a row in the artists table if the track has a + known remixer. + - id: bitrate + type: u4 + doc: | + Playback bit rate of the audio file. + - id: track_number + type: u4 + doc: | + The position of the track within an album. + - id: tempo + type: u4 + doc: | + The tempo at the start of the track in beats per minute, + multiplied by 100. + - id: genre_id + type: u4 + doc: | + References a row in the genres table if the track has a + known musical genre. + - id: album_id + type: u4 + doc: | + References a row in the albums table if the track has a + known album. + - id: artist_id + type: u4 + doc: | + References a row in the artists table if the track has a + known performer. + - id: id + type: u4 + doc: | + The id by which this track can be looked up; players will + report this value in their status packets when they are + playing the track. + - id: disc_number + type: u2 + doc: | + The number of the disc on which this track is found, if it + is known to be part of a multi-disc album. + - id: play_count + type: u2 + doc: | + The number of times this track has been played. + - id: year + type: u2 + doc: | + The year in which this track was released. + - id: sample_depth + type: u2 + doc: | + The number of bits per sample of the audio file. + - id: duration + type: u2 + doc: | + The length, in seconds, of the track when played at normal + speed. + - type: u2 + doc: | + From @flesniak: "always 41?" + - id: color_id type: u1 - - id: decimal_count + doc: | + References a row in the colors table if the track has been + assigned a color. + - id: rating type: u1 - - id: reserved1 - size: 2 - - id: work_area_id + doc: | + The number of stars to display for the track, 0 to 5. + - type: u2 + doc: | + From @flesniak: "always 1?" + - type: u2 + doc: | + From @flesniak: "alternating 2 or 3" + - id: ofs_strings + type: u2 + repeat: expr + repeat-expr: 21 + doc: | + The location, relative to the start of this row, of a + variety of variable-length strings. + instances: + unknown_string_1: + type: device_sql_string + pos: _parent.row_base + ofs_strings[0] + doc: | + A string of unknown purpose, which has so far only been + empty. + -webide-parse-mode: eager + texter: + type: device_sql_string + pos: _parent.row_base + ofs_strings[1] + doc: | + A string of unknown purpose, which @flesnik named. + -webide-parse-mode: eager + unknown_string_2: + type: device_sql_string + pos: _parent.row_base + ofs_strings[2] + doc: | + A string of unknown purpose; @flesniak said "thought + tracknumber -> wrong!" + unknown_string_3: + type: device_sql_string + pos: _parent.row_base + ofs_strings[3] + doc: | + A string of unknown purpose; @flesniak said "strange + strings, often zero length, sometimes low binary values + 0x01/0x02 as content" + unknown_string_4: + type: device_sql_string + pos: _parent.row_base + ofs_strings[4] + doc: | + A string of unknown purpose; @flesniak said "strange + strings, often zero length, sometimes low binary values + 0x01/0x02 as content" + -webide-parse-mode: eager + message: + type: device_sql_string + pos: _parent.row_base + ofs_strings[5] + doc: | + A string of unknown purpose, which @flesnik named. + -webide-parse-mode: eager + kuvo_public: + type: device_sql_string + pos: _parent.row_base + ofs_strings[6] + doc: | + A string whose value is always either empty or "ON", and + which apparently for some insane reason is used, rather than + a single bit somewhere, to control whether the track + information is visible on Kuvo. + -webide-parse-mode: eager + autoload_hotcues: + type: device_sql_string + pos: _parent.row_base + ofs_strings[7] + doc: | + A string whose value is always either empty or "ON", and + which apparently for some insane reason is used, rather than + a single bit somewhere, to control whether hot-cues are + auto-loaded for the track. + -webide-parse-mode: eager + unknown_string_5: + type: device_sql_string + pos: _parent.row_base + ofs_strings[8] + doc: | + A string of unknown purpose. + -webide-parse-mode: eager + unknown_string_6: + type: device_sql_string + pos: _parent.row_base + ofs_strings[9] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + date_added: + type: device_sql_string + pos: _parent.row_base + ofs_strings[10] + doc: | + A string containing the date this track was added to the collection. + -webide-parse-mode: eager + release_date: + type: device_sql_string + pos: _parent.row_base + ofs_strings[11] + doc: | + A string containing the date this track was released, if known. + -webide-parse-mode: eager + mix_name: + type: device_sql_string + pos: _parent.row_base + ofs_strings[12] + doc: | + A string naming the remix of the track, if known. + -webide-parse-mode: eager + unknown_string_7: + type: device_sql_string + pos: _parent.row_base + ofs_strings[13] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + analyze_path: + type: device_sql_string + pos: _parent.row_base + ofs_strings[14] + doc: | + The file path of the track analysis, which allows rapid + seeking to particular times in variable bit-rate files, + jumping to particular beats, visual waveform previews, and + stores cue points and loops. + -webide-parse-mode: eager + analyze_date: + type: device_sql_string + pos: _parent.row_base + ofs_strings[15] + doc: | + A string containing the date this track was analyzed by rekordbox. + -webide-parse-mode: eager + comment: + type: device_sql_string + pos: _parent.row_base + ofs_strings[16] + doc: | + The comment assigned to the track by the DJ, if any. + -webide-parse-mode: eager + title: + type: device_sql_string + pos: _parent.row_base + ofs_strings[17] + doc: | + The title of the track. + -webide-parse-mode: eager + unknown_string_8: + type: device_sql_string + pos: _parent.row_base + ofs_strings[18] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + filename: + type: device_sql_string + pos: _parent.row_base + ofs_strings[19] + doc: | + The file name of the track audio file. + -webide-parse-mode: eager + file_path: + type: device_sql_string + pos: _parent.row_base + ofs_strings[20] + doc: | + The file path of the track audio file. + -webide-parse-mode: eager + + device_sql_string: + doc: | + A variable length string which can be stored in a variety of + different encodings. + seq: + - id: length_and_kind type: u1 - - id: reserved2 - size: 2 - - id: set_fields_flag + doc: | + Mangled length of an ordinary ASCII string if odd, or a flag + indicating another encoding with a longer length value to + follow. + - id: body + type: + switch-on: length_and_kind + cases: + 0x40: device_sql_long_ascii + 0x90: device_sql_long_utf16be + _: device_sql_short_ascii(length_and_kind) + -webide-parse-mode: eager + -webide-representation: '{body.text}' + + device_sql_short_ascii: + doc: | + An ASCII-encoded string up to 127 bytes long. + params: + - id: mangled_length type: u1 - - id: reserved3 - size: 8 + doc: | + Contains the actual length, incremented, doubled, and + incremented again. Go figure. + seq: + - id: text + type: str + size: length + encoding: ascii + if: '(mangled_length % 2 > 0) and (length >= 0)' # Skip invalid strings + doc: | + The content of the string. + instances: + length: + value: '((mangled_length - 1) / 2) - 1' + doc: | + The un-mangled length of the string, in bytes. + -webide-parse-mode: eager + + device_sql_long_ascii: + doc: | + An ASCII-encoded string preceded by a two-byte length field. + TODO May need to skip a byte after the length! + Have not found any test data. + seq: + - id: length + type: u2 + doc: | + Contains the length of the string in bytes. + - id: text + type: str + size: length + encoding: ascii + doc: | + The content of the string. + + device_sql_long_utf16be: + doc: | + A UTF-16BE-encoded string preceded by a two-byte length field. + seq: + - id: length + type: u2 + doc: | + Contains the length of the string in bytes, including two trailing nulls. + - id: text + type: str + size: length - 4 + encoding: utf-16be + doc: | + The content of the string. + +enums: + page_type: + 0: + id: tracks + doc: | + Holds rows describing tracks, such as their title, artist, + genre, artwork ID, playing time, etc. + 1: + id: genres + doc: | + Holds rows naming musical genres, for reference by tracks and searching. + 2: + id: artists + doc: | + Holds rows naming artists, for reference by tracks and searching. + 3: + id: albums + doc: | + Holds rows naming albums, for reference by tracks and searching. + 4: + id: labels + doc: | + Holds rows naming music labels, for reference by tracks and searching. + 5: + id: keys + doc: | + Holds rows naming musical keys, for reference by tracks and searching. + 6: + id: colors + doc: | + Holds rows naming color labels, for reference by tracks and searching. + 7: + id: playlist_tree + doc: | + Holds rows that describe the hierarchical tree structure of + available playlists and folders grouping them. + 8: + id: playlist_entries + doc: | + Holds rows that enumerate the tracks found in playlists and + the playlists they belong to. + 9: + id: unknown_9 + 10: + id: unknown_10 + 11: + id: unknown_11 + doc: | + The rows all seem to have history file names in them, such as "HISTORY 001". + 12: + id: unknown_12 + 13: + id: artwork + doc: | + Holds rows pointing to album artwork images. + 14: + id: unknown_14 + 15: + id: unknown_15 + 16: + id: columns + doc: | + TODO figure out and explain + 17: + id: unknown_17 + 18: + id: unknown_18 + 19: + id: history + doc: | + Holds rows listing tracks played in performance sessions. From 00ea7011dd9462ad2f3d7d2999cb98e8e402f81f Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Sun, 25 Nov 2018 12:24:11 -0600 Subject: [PATCH 07/14] Replace file accitentally moved. --- database/dbf.ksy | 101 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 database/dbf.ksy diff --git a/database/dbf.ksy b/database/dbf.ksy new file mode 100644 index 000000000..9f4543212 --- /dev/null +++ b/database/dbf.ksy @@ -0,0 +1,101 @@ +meta: + id: dbf + file-extension: dbf + application: dBASE + license: CC0-1.0 + endian: le +seq: + - id: header1 + type: header1 + - id: header2 + size: header1.len_header - 12 + type: header2 + - id: records + size: header1.len_record + repeat: expr + repeat-expr: header1.num_records +types: + header1: + doc-ref: http://www.dbase.com/Knowledgebase/INT/db7_file_fmt.htm - section 1.1 + seq: + - id: version + type: u1 + - id: last_update_y + type: u1 + - id: last_update_m + type: u1 + - id: last_update_d + type: u1 + - id: num_records + type: u4 + - id: len_header + type: u2 + - id: len_record + type: u2 + instances: + dbase_level: + value: 'version & 0b111' + header2: + seq: + - id: header_dbase_3 + if: _root.header1.dbase_level == 3 + type: header_dbase_3 + - id: header_dbase_7 + if: _root.header1.dbase_level == 7 + type: header_dbase_7 + - id: fields + type: field + repeat: expr + repeat-expr: 11 + header_dbase_3: + seq: + - id: reserved1 + size: 3 + - id: reserved2 + size: 13 + - id: reserved3 + size: 4 + header_dbase_7: + seq: + - id: reserved1 + contents: [0, 0] + - id: has_incomplete_transaction + type: u1 + - id: dbase_iv_encryption + type: u1 + - id: reserved2 + size: 12 + - id: production_mdx + type: u1 + - id: language_driver_id + type: u1 + - id: reserved3 + contents: [0, 0] + - id: language_driver_name + size: 32 + - id: reserved4 + size: 4 + field: + seq: + - id: name + type: str + encoding: ASCII + size: 11 + - id: datatype + type: u1 + - id: data_address + type: u4 + - id: length + type: u1 + - id: decimal_count + type: u1 + - id: reserved1 + size: 2 + - id: work_area_id + type: u1 + - id: reserved2 + size: 2 + - id: set_fields_flag + type: u1 + - id: reserved3 + size: 8 From 7ed34f0af4ee7f0ba9c477a646ef144263bf3e54 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Sun, 25 Nov 2018 12:30:32 -0600 Subject: [PATCH 08/14] Simplify storiage of heap position per @GreyCat --- database/rekordbox_pdb.ksy | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/rekordbox_pdb.ksy b/database/rekordbox_pdb.ksy index 0c74d4800..d93ac44a0 100644 --- a/database/rekordbox_pdb.ksy +++ b/database/rekordbox_pdb.ksy @@ -215,7 +215,7 @@ types: entries for strange pages?" - id: heap size-eos: true - if: heap_pos < 0 # never true, but stores pos + if: false # never true, but stores pos instances: is_data_page: value: page_flags & 0x40 == 0 From c49dea287bb721f0652cfa40be9bdb12959c9d8b Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Mon, 26 Nov 2018 20:38:03 -0600 Subject: [PATCH 09/14] Improved understanding of waveform headers. --- media/rekordbox_anlz.ksy | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/media/rekordbox_anlz.ksy b/media/rekordbox_anlz.ksy index 977774373..137711692 100644 --- a/media/rekordbox_anlz.ksy +++ b/media/rekordbox_anlz.ksy @@ -234,7 +234,10 @@ types: A larger waveform image suitable for scrolling along as a track plays. seq: - - type: u4 # Always 1? + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 1. - id: len_entries type: u4 doc: | @@ -242,7 +245,7 @@ types: byte. - type: u4 # Always 0x960000? - id: entries - size: len_entries + size: len_entries * len_entry_bytes wave_color_preview_tag: doc: | @@ -250,7 +253,10 @@ types: above the touch strip for jumping to a track position on newer high-resolution players. seq: - - type: u4 + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 6. - id: len_entries type: u4 doc: | @@ -258,7 +264,7 @@ types: byte for each of six channels of information. - type: u4 - id: entries - size: len_entries * 6 + size: len_entries * len_entry_bytes wave_color_scroll_tag: doc: | @@ -266,7 +272,10 @@ types: as a track plays on newer high-resolution hardware. Also contains a higher-resolution blue/white waveform. seq: - - type: u4 # I have seen the value 2? + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 2. - id: len_entries type: u4 doc: | @@ -275,7 +284,7 @@ types: translate the payload into color columns. - type: u4 - id: entries - size-eos: true + size: len_entries * len_entry_bytes unknown_tag: {} From ad378dbe114df4654cf07ad131e16741432a3ebf Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Wed, 5 Dec 2018 19:59:14 -0600 Subject: [PATCH 10/14] Loosen up constraints on magic row header bytes. @iamtunzor_twitter found media where there were different values for the artist row, which was causing total database parse failure. Now we should be robust as long as there are no actual structural changes. --- database/rekordbox_pdb.ksy | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/database/rekordbox_pdb.ksy b/database/rekordbox_pdb.ksy index d93ac44a0..6dd87fdf9 100644 --- a/database/rekordbox_pdb.ksy +++ b/database/rekordbox_pdb.ksy @@ -339,8 +339,9 @@ types: doc: | A row that holds an album name and ID. seq: - - id: magic - contents: [0x80, 0x00] + - type: u2 + doc: | + Some kind of magic word? Usually 0x80, 0x00. - id: index_shift type: u2 doc: TODO name from @flesniak, but what does it mean? @@ -375,8 +376,10 @@ types: doc: | A row that holds an artist name and ID. seq: - - id: magic - contents: [0x60, 0x00] + - type: u2 + doc: | + Some kind of magic word? Usually 0x60, 0x00 but have seen + 0x64, 0x00. - id: index_shift type: u2 doc: TODO name from @flesniak, but what does it mean? @@ -538,8 +541,9 @@ types: details about the music, and links to other tables like artists, albums, keys, etc. seq: - - id: magic - contents: [0x24, 0x00] + - type: u2 + doc: | + Some kind of magic word? Usually 0x24, 0x00. - id: index_shift type: u2 doc: TODO name from @flesniak, but what does it mean? From 97a5b4989ee1d7197767f0154819ec1b6f7e49bf Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Tue, 11 Dec 2018 01:22:18 -0600 Subject: [PATCH 11/14] Handle longer-string variant of artist rows. Thanks to @iamtunzor_twitter in Croatia for getting his DJ to share a copy of the problematic database file with me! --- database/rekordbox_pdb.ksy | 52 ++++++++++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 5 deletions(-) diff --git a/database/rekordbox_pdb.ksy b/database/rekordbox_pdb.ksy index 6dd87fdf9..ec4887638 100644 --- a/database/rekordbox_pdb.ksy +++ b/database/rekordbox_pdb.ksy @@ -376,10 +376,10 @@ types: doc: | A row that holds an artist name and ID. seq: - - type: u2 + - id: subtype + type: u2 doc: | - Some kind of magic word? Usually 0x60, 0x00 but have seen - 0x64, 0x00. + Usually 0x60, but 0x64 means we have a long name embedded in the row. - id: index_shift type: u2 doc: TODO name from @flesniak, but what does it mean? @@ -390,19 +390,34 @@ types: and linked from other rows (such as tracks). - type: u1 doc: | - @flesniak says: "alwayx 0x03, maybe an unindexed empty string" + @flesniak says: "always 0x03, maybe an unindexed empty string" - id: ofs_name type: u1 doc: | The location of the variable-length name string, relative to - the start of this row. + the start of this row, unless subtype ix 0x64. instances: name: type: device_sql_string + if: subtype == 0x60 pos: _parent.row_base + ofs_name doc: | The name of this artist. -webide-parse-mode: eager + ofs_long_name: + type: u2 + if: subtype == 0x64 + pos: _parent.row_base + 0x0a + doc: | + For Names longer than 0xff bytes, this holds the position + relative to the start of the row. + long_name: + type: device_sql_long_string + if: subtype == 0x64 + pos: _parent.row_base + ofs_long_name + doc: | + Names longer than 0xff bytes will be found here. + artwork_row: doc: | @@ -839,6 +854,24 @@ types: -webide-parse-mode: eager -webide-representation: '{body.text}' + device_sql_long_string: + doc: | + A string longer than 255 bytes that is embedded next to a row. + seq: + - id: kind + type: u1 + doc: | + Indicates whether the encoding is ASCII or UTF16-BE. + - id: body + type: + switch-on: kind + cases: + 0x40: device_sql_long_ascii + 0x90: device_sql_long_utf16be + _: device_sql_unknown + -webide-parse-mode: eager + -webide-representation: '{body.text}' + device_sql_short_ascii: doc: | An ASCII-encoded string up to 127 bytes long. @@ -895,6 +928,15 @@ types: doc: | The content of the string. + device_sql_unknown: + doc: | + A string type we do not yet recognize, but want to avoid crashing. + seq: + - id: length + type: u2 + doc: | + Contains the length of the string in bytes, we think? + enums: page_type: 0: From a2898a03a65db4f1f7e19f7e3bf84d16ba338f65 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Tue, 11 Dec 2018 21:17:09 -0600 Subject: [PATCH 12/14] Improve long-offset artist name handling. Now has a much clearer structure in the .ksy *and* provides a single, unified API for the struct user to access the name however it was stored. --- database/rekordbox_pdb.ksy | 59 +++++++++----------------------------- 1 file changed, 13 insertions(+), 46 deletions(-) diff --git a/database/rekordbox_pdb.ksy b/database/rekordbox_pdb.ksy index ec4887638..bf5b67b58 100644 --- a/database/rekordbox_pdb.ksy +++ b/database/rekordbox_pdb.ksy @@ -379,7 +379,8 @@ types: - id: subtype type: u2 doc: | - Usually 0x60, but 0x64 means we have a long name embedded in the row. + Usually 0x60, but 0x64 means we have a long name offset + embedded in the row. - id: index_shift type: u2 doc: TODO name from @flesniak, but what does it mean? @@ -391,33 +392,26 @@ types: - type: u1 doc: | @flesniak says: "always 0x03, maybe an unindexed empty string" - - id: ofs_name + - id: ofs_name_near type: u1 doc: | The location of the variable-length name string, relative to - the start of this row, unless subtype ix 0x64. + the start of this row, unless subtype is 0x64. instances: + ofs_name_far: + pos: _parent.row_base + 0x0a + type: u2 + doc: | + For names that might be further than 0xff bytes from the + start of this row, this holds a two-byte offset, and is + signalled by the subtype value. + if: subtype == 0x64 name: + pos: '_parent.row_base + (subtype == 0x64? ofs_name_far : ofs_name_near)' type: device_sql_string - if: subtype == 0x60 - pos: _parent.row_base + ofs_name doc: | The name of this artist. -webide-parse-mode: eager - ofs_long_name: - type: u2 - if: subtype == 0x64 - pos: _parent.row_base + 0x0a - doc: | - For Names longer than 0xff bytes, this holds the position - relative to the start of the row. - long_name: - type: device_sql_long_string - if: subtype == 0x64 - pos: _parent.row_base + ofs_long_name - doc: | - Names longer than 0xff bytes will be found here. - artwork_row: doc: | @@ -854,24 +848,6 @@ types: -webide-parse-mode: eager -webide-representation: '{body.text}' - device_sql_long_string: - doc: | - A string longer than 255 bytes that is embedded next to a row. - seq: - - id: kind - type: u1 - doc: | - Indicates whether the encoding is ASCII or UTF16-BE. - - id: body - type: - switch-on: kind - cases: - 0x40: device_sql_long_ascii - 0x90: device_sql_long_utf16be - _: device_sql_unknown - -webide-parse-mode: eager - -webide-representation: '{body.text}' - device_sql_short_ascii: doc: | An ASCII-encoded string up to 127 bytes long. @@ -928,15 +904,6 @@ types: doc: | The content of the string. - device_sql_unknown: - doc: | - A string type we do not yet recognize, but want to avoid crashing. - seq: - - id: length - type: u2 - doc: | - Contains the length of the string in bytes, we think? - enums: page_type: 0: From ed80bf650a22f4a9fc6fc7b2e047f6bf654cdb28 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Thu, 7 Mar 2019 23:16:31 -0600 Subject: [PATCH 13/14] Incorporate tweaks made while finishing writeup article. --- database/rekordbox_pdb.ksy | 22 ++++++++++------------ media/rekordbox_anlz.ksy | 23 +++++++++++------------ 2 files changed, 21 insertions(+), 24 deletions(-) diff --git a/database/rekordbox_pdb.ksy b/database/rekordbox_pdb.ksy index bf5b67b58..ba9eca0a2 100644 --- a/database/rekordbox_pdb.ksy +++ b/database/rekordbox_pdb.ksy @@ -198,14 +198,14 @@ types: - id: num_rows_large type: u2 doc: | - Holds the value used for `num_rows` (see below) when that is - too large to fit into `num_rows_small`, and that situation - seems to be indicated when this value is larger than - `num_rows_small`, but not equal to `0x1fff`. This seems like - some strange mechanism to deal with the fact that lots of - tiny entries, such as are found in the `playlist_entries` - table, are too big to count with a single byte. But why not - just always use this value, then? + Holds the value used for `num_rows` (as described above) + when that is too large to fit into `num_rows_small`, and + that situation seems to be indicated when this value is + larger than `num_rows_small`, but not equal to `0x1fff`. + This seems like some strange mechanism to deal with the fact + that lots of tiny entries, such as are found in the + `playlist_entries` table, are too big to count with a single + byte. But why not just always use this value, then? - type: u2 doc: | @flesniak said: "1004 for strange blocks, 0 otherwise" @@ -511,8 +511,8 @@ types: - id: id type: u4 doc: | - The unique identifier by which this playlist can be requested - and linked from other rows (such as tracks). + The unique identifier by which this playlist or folder can + be requested and linked from other rows. - id: raw_is_folder type: u4 doc: | @@ -875,8 +875,6 @@ types: device_sql_long_ascii: doc: | An ASCII-encoded string preceded by a two-byte length field. - TODO May need to skip a byte after the length! - Have not found any test data. seq: - id: length type: u2 diff --git a/media/rekordbox_anlz.ksy b/media/rekordbox_anlz.ksy index 137711692..4a88b0dd7 100644 --- a/media/rekordbox_anlz.ksy +++ b/media/rekordbox_anlz.ksy @@ -68,15 +68,15 @@ types: type: switch-on: fourcc cases: - 0x50434f42: cue_tag #'section_tags::cues' - 0x50505448: path_tag #'section_tags::path' - 0x5051545a: beat_grid_tag #'section_tags::beat_grid' - 0x50564252: vbr_tag #'section_tags::vbr' - 0x50574156: wave_preview_tag #'section_tags::wave_preview' - 0x50575632: wave_preview_tag #'section_tags::wave_tiny' - 0x50575633: wave_scroll_tag #'section_tags::wave_scroll' - 0x50575634: wave_color_preview_tag #'section_tags::wave_color_preview' - 0x50575635: wave_color_scroll_tag #'section_tags::wave_color_scroll' + 0x50434f42: cue_tag #'section_tags::cues' (PCOB) + 0x50505448: path_tag #'section_tags::path' (PPTH) + 0x5051545a: beat_grid_tag #'section_tags::beat_grid' (PQTZ) + 0x50564252: vbr_tag #'section_tags::vbr' (PVBR) + 0x50574156: wave_preview_tag #'section_tags::wave_preview' (PWAV) + 0x50575632: wave_preview_tag #'section_tags::wave_tiny' (PWV2) + 0x50575633: wave_scroll_tag #'section_tags::wave_scroll' (PWV3, seen in .EXT) + 0x50575634: wave_color_preview_tag #'section_tags::wave_color_preview' (PWV4, in .EXT) + 0x50575635: wave_color_scroll_tag #'section_tags::wave_color_scroll' (PWV5, in .EXT) _: unknown_tag -webide-representation: '{fourcc}' @@ -128,7 +128,7 @@ types: type: u4 enum: cue_list_type doc: | - Identifies whether this tag stors ordinary or hot cues. + Identifies whether this tag stores ordinary or hot cues. - id: len_cues type: u4 doc: | @@ -280,8 +280,7 @@ types: type: u4 doc: | The number of columns of waveform data (this matches the - non-color waveform length), but we do not yet know how to - translate the payload into color columns. + non-color waveform length. - type: u4 - id: entries size: len_entries * len_entry_bytes From 52bd8abfa2ebb4fac205fa4a572f943305fc94c9 Mon Sep 17 00:00:00 2001 From: James Elliott <> Date: Sun, 10 May 2020 14:49:58 -0500 Subject: [PATCH 14/14] Add recent discoveries. --- media/rekordbox_anlz.ksy | 205 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 203 insertions(+), 2 deletions(-) diff --git a/media/rekordbox_anlz.ksy b/media/rekordbox_anlz.ksy index 4a88b0dd7..6a53fef27 100644 --- a/media/rekordbox_anlz.ksy +++ b/media/rekordbox_anlz.ksy @@ -52,7 +52,7 @@ types: seq: - id: fourcc type: s4 - # enum: section_tags Can't use this until enums support default/unmatched value + # enum: section_tags # Can't use this line until KSC supports switching on possibly-null enums in Java. doc: | A tag value indicating what kind of section this is. - id: len_header @@ -68,6 +68,7 @@ types: type: switch-on: fourcc cases: + 0x50434f32: cue_extended_tag #'section_tags::cues_2' (PCO2) 0x50434f42: cue_tag #'section_tags::cues' (PCOB) 0x50505448: path_tag #'section_tags::path' (PPTH) 0x5051545a: beat_grid_tag #'section_tags::beat_grid' (PQTZ) @@ -77,6 +78,7 @@ types: 0x50575633: wave_scroll_tag #'section_tags::wave_scroll' (PWV3, seen in .EXT) 0x50575634: wave_color_preview_tag #'section_tags::wave_color_preview' (PWV4, in .EXT) 0x50575635: wave_color_scroll_tag #'section_tags::wave_color_scroll' (PWV5, in .EXT) + 0x50535349: song_structure_tag #'section_tags::song_structure' (PSSI, in .EXT) _: unknown_tag -webide-representation: '{fourcc}' @@ -129,8 +131,9 @@ types: enum: cue_list_type doc: | Identifies whether this tag stores ordinary or hot cues. + - size: 2 - id: len_cues - type: u4 + type: u2 doc: | The length of the cue list. - id: memory_count @@ -188,6 +191,98 @@ types: back to the cue time if this is a loop. - size: 16 + cue_extended_tag: + doc: | + A variation of cue_tag which was introduced with the nxs2 line, + and adds descriptive names. (Still comes in two forms, either + holding memory cues and loop points, or holding hot cues and + loop points.) Also includes hot cues D through H and color assignment. + seq: + - id: type + type: u4 + enum: cue_list_type + doc: | + Identifies whether this tag stores ordinary or hot cues. + - id: len_cues + type: u2 + doc: | + The length of the cue comment list. + - size: 2 + - id: cues + type: cue_extended_entry + repeat: expr + repeat-expr: len_cues + + cue_extended_entry: + doc: | + A cue extended list entry. Can either describe a memory cue or a + loop. + seq: + - contents: "PCP2" + - id: len_header + type: u4 + - id: len_entry + type: u4 + - id: hot_cue + type: u4 + doc: | + If zero, this is an ordinary memory cue, otherwise this a + hot cue with the specified number. + - id: type + type: u1 + enum: cue_entry_type + doc: | + Indicates whether this is a memory cue or a loop. + - size: 3 # seems to always be 1000 + - id: time + type: u4 + doc: | + The position, in milliseconds, at which the cue point lies + in the track. + - id: loop_time + type: u4 + doc: | + The position, in milliseconds, at which the player loops + back to the cue time if this is a loop. + - id: color_id + type: u1 + doc: | + References a row in the colors table if this is a memory cue or loop + and has been assigned a color. + - size: 11 # Loops seem to have some non-zero values in the last four bytes of this. + - id: len_comment + type: u4 + if: len_entry > 43 + - id: comment + type: str + size: len_comment + encoding: utf-16be + doc: | + The comment assigned to this cue by the DJ, if any, with a trailing NUL. + if: len_entry > 43 + - id: color_code + type: u1 + doc: | + A lookup value for a color table? We use this to index to the hot cue colors shown in rekordbox. + if: (len_entry - len_comment) > 44 + - id: color_red + type: u1 + doc: | + The red component of the hot cue color to be displayed. + if: (len_entry - len_comment) > 45 + - id: color_green + type: u1 + doc: | + The green component of the hot cue color to be displayed. + if: (len_entry - len_comment) > 46 + - id: color_blue + type: u1 + doc: | + The blue component of the hot cue color to be displayed. + if: (len_entry - len_comment) > 47 + - size: len_entry - 48 - len_comment # The remainder after the color + if: (len_entry - len_comment) > 48 + path_tag: doc: | Stores the file path of the audio file to which this analysis @@ -228,6 +323,7 @@ types: size: len_preview doc: | The actual bytes of the waveform preview. + if: _parent.len_tag > _parent.len_header wave_scroll_tag: doc: | @@ -285,6 +381,86 @@ types: - id: entries size: len_entries * len_entry_bytes + song_structure_tag: + doc: | + Stores the song structure, also known as phrases (intro, verse, + bridge, chorus, up, down, outro). + seq: + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 24. + - id: len_entries + type: u2 + doc: | + The number of phrases. + - id: style + type: u2 + # enum: phrase_style Can't use this line until KSC supports switching on possibly-null enums in Java. + doc: | + The phrase style. 1 is the up-down style + (white label text in rekordbox) where the main phrases consist + of up, down, and chorus. 2 is the bridge-verse style + (black label text in rekordbox) where the main phrases consist + of verse, chorus, and bridge. Style 3 is mostly identical to + bridge-verse style except verses 1-3 are labeled VERSE1 and verses + 4-6 are labeled VERSE2 in rekordbox. + - size: 6 + - id: end_beat + type: u2 + doc: | + The beat number at which the last phrase ends. The track may + continue after the last phrase ends. If this is the case, it will + mostly be silence. + - size: 4 + - id: entries + type: song_structure_entry + repeat: expr + repeat-expr: len_entries + + song_structure_entry: + doc: | + A song structure entry, represents a single phrase. + seq: + - id: phrase_number + type: u2 + doc: | + The absolute number of the phrase, starting at one. + - id: beat_number + type: u2 + doc: | + The beat number at which the phrase starts. + - id: phrase_id + type: + switch-on: _parent.style + cases: + 1: phrase_up_down # 'phrase_style::up_down' + 2: phrase_verse_bridge # 'phrase_style::verse_bridge' + _: phrase_verse_bridge + doc: | + Identifier of the phrase label. + - size: _parent.len_entry_bytes - 9 + - id: fill_in + type: u1 + doc: | + If nonzero, fill-in is present. + - id: fill_in_beat_number + type: u2 + doc: | + The beat number at which fill-in starts. + + phrase_up_down: + seq: + - id: id + type: u2 + enum: phrase_up_down_id + + phrase_verse_bridge: + seq: + - id: id + type: u2 + enum: phrase_verse_bridge_id + unknown_tag: {} enums: @@ -299,6 +475,7 @@ enums: 0x50575633: wave_scroll # PWV3 (seen in .EXT) 0x50575634: wave_color_preview # PWV4 (seen in .EXT) 0x50575635: wave_color_scroll # PWV5 (seen in .EXT) + 0x50535349: song_structure # PSSI (seen in .EXT) cue_list_type: 0: memory_cues @@ -311,3 +488,27 @@ enums: cue_entry_status: 0: disabled 1: enabled + + phrase_style: + 1: up_down + 2: verse_bridge + 3: verse_bridge_2 + + phrase_verse_bridge_id: + 1: intro + 2: verse1 + 3: verse2 + 4: verse3 + 5: verse4 + 6: verse5 + 7: verse6 + 8: bridge + 9: chorus + 10: outro + + phrase_up_down_id: + 1: intro + 2: up + 3: down + 5: chorus + 6: outro