![File_Formats](https://private-user-images.githubusercontent.com/148831617/284130493-b440b9e1-8eda-4090-818c-d67ea6e688db.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMjUxMTgsIm5iZiI6MTczOTEyNDgxOCwicGF0aCI6Ii8xNDg4MzE2MTcvMjg0MTMwNDkzLWI0NDBiOWUxLThlZGEtNDA5MC04MThjLWQ2N2VhNmU2ODhkYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOVQxODEzMzhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zZDAyYjRhNWM1ODk4OTA0OTMyODA4NmJkODg1ZTYyOThiNWVlNjc1YmZhZmVjZTUwNDhiOWQ5ZDY1YjYzNWExJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.bnoxfpsXvYSwwkTiCUFFTTY6gwzOlKwF_IdTNdOamCo)
The strongest motivation is Efficiency, which needs control of how data is organized, stored. Memory is byte addressable, while Disk are block addressable hence layout matters a lot.
Data at rest in disk need to be encoded to be read and interpreted again. Encoding plays an important role.
Broadly we have 2 types of byte orders in which data is stored
- Big Endian - MSB is stored at lowest address space
- Little Endian - LSB is stored at lowest address space
TODO - Add Image here
TODO - Add code references here
Following are the integer types used
Type | Stoarge used |
---|---|
byte | 1 byte |
short | 2 byte |
int | 4 byte |
long | 8 byte |
Floating point numbers float and double are represented by IEEE 754 standard
The Image is from Database Internals Book
Strings are stored as length + data or null terminated string.
String => Length (2 byte) + Data
For simplicity, we shall only consider Strings of the above format.
Example
String => Database Internals
----------------------------
|0x12 | Database Internals|
----------------------------
Note: Here we have not considered the String encoding for simplicity.
Boolean - can be used as 1 bit, 0 - false, 1 - true
Enums - Integers can be used to represent enums
Flags - represents nonmutually exclusive names. We can use & operation to find the value
- null
- Date time
- Binary data
Database can have single file or multi file storage.
Typically, we have Database storage as
File(s) - One or more file that stores data or index Page(s) - Files are divided into fixed size pages which stores data/index
- Maintains File/Pages
- Schedules read/write of pages
- Does not deal with redundancy
- Fixed size
- can hold data/metadata
- usually self conatined
- has unique id
DBMS maintains pages in different ways
- Heap File Organization
- Tree File Organization
- Sequential/Sorted File Organization (ISAM)
- Hashing File Organization
Storage Manager has to support following operations for Pages
- Create
- Update
- Write
- Delete
Page layout determines how data is stored inside pages
Approach #1: Tuple Oriented Storage Approach #2: Log Structures Storage Approach #3: Index-organized Storage
- Store Tuple one after other
- Doesn't allow variable size data
- Waste a lot of space
To conserve space and support variable size tuple, slotted pages are used. Idea is to maintain an Slot Array to maintain pointer to Tuples. Each slot entry is empty or points a valid Tuple within the page This approach does need defragmentation to reclaim space
Using the metadata, we can add meaning to the Tuples. Like for B-Tree we can use the Tuple to store node information
The Last section of book Cell Layout is specific to B-Tree and it makes more sense to cover it along the B-Tree implementation in the next chapter