Skip to content

Memory size calculation

Aleksandr Lyapunov edited this page Jun 1, 2020 · 11 revisions

There is a frequently asked question about how much memory Tarantool will use with a specified data set. In this page I'll try to give a small guide about memory usage.

Data memory consists of two parts - a tuple arena and index data. Tuple arena stores tuples that are shared between all indexes of the same space.

Tuple arena:

First of all, Tarantool stores given data in msgpack arrays (http://msgpack.org/). Here are some examples of memory cost for some values:

  • 42 - 1 byte (prefixless optimization for short integers)
  • 100000 - 5 bytes (1 byte prefix + uint32)
  • 10000000000 - 9 bytes (1 byte prefix + uint64, the largest size of msgpacked integer)
  • "abc" - 4 bytes (1 byte prefix + 3 symbols string)
  • {42} - 2 bytes (1 byte array prefix + 1 byte 42)

Thus the size of msgpack (bsize) of a tuple {"abc", 100000, 42} will be 1 + 4 + 5 + 1 = 11 bytes.

Then every tuple has 14 byte system prefix ( that is sizeof(struct tuple) ).

Then every tuple stores 4 byte offsets for every indexed field except the first field. Here are some examples of offset cost:

  • One index, parts = {1, 'uint'} - 0
  • One index, parts = {2, 'uint'} - 4
  • One index, parts = {1, 'uint', 2, 'uint'} - 4
  • One index, parts = {2, 'uint', 3, 'uint'} - 8
  • Primary index parts = {1, 'uint', 2, 'uint'}, secondary parts = {2, 'uint', 1, 'uint'} - 4
  • Primary index parts = {2, 'uint', 3, 'uint'}, secondary parts = {3, 'uint', 2, 'uint'} - 8

Then the total size is rounded up to cfg.slab_alloc_minimal (usually 16, so that doesn't matter)

Then the tuples less than 128 bytes are aligned up to 8 (i.e. rounded up to the closest multiple of 8). The tuples with greater sizes are rounded up in some complex way, usual losses are about 5%

Indexes:

Indexes stores pointers to tuples in shared tuple arena and thus memory cost depends only on the number of records in the index.

  • Tree index costs about 18 bytes per record (10 bytes in Tarantool <= 2.1)
  • Hash index costs about 16 bytes per record

Note that the both indexes reserve 48kB during first insert, but asymptotically that is negligible.

Other costs:

During a snapshot process Tarantool does not delete tuples needed for a snapshot read view. One can calculate the number of replaces/updates/etc that could be done during the snapshot process and calculate tuple/index cost for that amount.

Overall example:

We store about 500 000 000 records in Tarantool that consist of tuples like {ID, email}, where ID usually fits to uint32 and email is a string with 20 characters on average.

We have hash index by ID and tree index by email.

mgpack will be 1 + (1 + 4) + (1 + 20) = 27 bytes on average

+14 bytes header = 41

+4 bytes as a cost of an offset for the second index = 45

round up - 48 bytes

+16 bytes for hash index

+18 bytes for tree index

Total 82 bytes

We have about 10 krps and even if the snapshot process lasted for 5 minutes, we would have to store additionally 5 * 60 * 10000 = 3 000 000 records.

Total cost is: 82 * (500 000 000 + 3 000 000) = 42GB

It's better to reserve about 10% and allow tarantool to use 46GB

Clone this wiki locally