-
Notifications
You must be signed in to change notification settings - Fork 0
Memory size calculation
There is a frequently asked question about how much memory Tarantool will use with a specified data set. In this page I'll try to give a small guide about memory usage.
Data memory consists of two parts - a tuple arena and index data. Tuple arena stores tuples that are shared between all indexes of the same space.
First of all, Tarantool stores given data in msgpack arrays (http://msgpack.org/). Here are some examples of memory cost for some values:
- 42 - 1 byte (prefixless optimization for short integers)
- 100000 - 5 bytes (1 byte prefix + uint32)
- 10000000000 - 9 bytes (1 byte prefix + uint64, the largest size of msgpacked integer)
- "abc" - 4 bytes (1 byte prefix + 3 symbols string)
- {42} - 2 bytes (1 byte array prefix + 1 byte 42)
Thus the size of msgpack (bsize) of a tuple {"abc", 100000, 42} will be 1 + 4 + 5 + 1 = 11 bytes.
Then every tuple has 14 byte system prefix ( that is sizeof(struct tuple) ).
Then every tuple stores 4 byte offsets for every indexed field except the first field. Here are some examples of offset cost:
- One index, parts = {1, 'uint'} - 0
- One index, parts = {2, 'uint'} - 4
- One index, parts = {1, 'uint', 2, 'uint'} - 4
- One index, parts = {2, 'uint', 3, 'uint'} - 8
- Primary index parts = {1, 'uint', 2, 'uint'}, secondary parts = {2, 'uint', 1, 'uint'} - 4
- Primary index parts = {2, 'uint', 3, 'uint'}, secondary parts = {3, 'uint', 2, 'uint'} - 8
Then the total size is rounded up to cfg.slab_alloc_minimal (usually 16, so that doesn't matter)
Then the tuples less than 128 bytes are aligned up to 8 (i.e. rounded up to the closest multiple of 8). The tuples with greater sizes are rounded up in some complex way, usual losses are about 5%
Indexes stores pointers to tuples in shared tuple arena and thus memory cost depends only on the number of records in the index.
- Tree index costs about 18 bytes per record (10 bytes in Tarantool <= 2.1)
- Hash index costs about 16 bytes per record
Note that the both indexes reserve 48kB during first insert, but asymptotically that is negligible.
During a snapshot process Tarantool does not delete tuples needed for a snapshot read view. One can calculate the number of replaces/updates/etc that could be done during the snapshot process and calculate tuple/index cost for that amount.
We store about 500 000 000 records in Tarantool that consist of tuples like {ID, email}, where ID usually fits to uint32 and email is a string with 20 characters on average.
We have hash index by ID and tree index by email.
mgpack will be 1 + (1 + 4) + (1 + 20) = 27 bytes on average
+14 bytes header = 41
+4 bytes as a cost of an offset for the second index = 45
round up - 48 bytes
+16 bytes for hash index
+18 bytes for tree index
Total 82 bytes
We have about 10 krps and even if the snapshot process lasted for 5 minutes, we would have to store additionally 5 * 60 * 10000 = 3 000 000 records.
Total cost is: 82 * (500 000 000 + 3 000 000) = 42GB
It's better to reserve about 10% and allow tarantool to use 46GB
Architecture Specifications
- Server architecture
- Feature specifications
- What's in a good specification
- Functional indexes
- Space _index structure
- R tree index quick start and usage
- LuaJIT
- Vinyl
- SQL
- Testing
- Performance
How To ...?
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug Lua state with GDB
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
Lua modules
Useful links