-
Notifications
You must be signed in to change notification settings - Fork 0
Server architecture
Three main threads run in the Tarantool process (see calls to cord_costart)
- Main transaction thread
- WAL (write-ahead log) thread (wal_writer_f)
- Networking (net_cord_f)
There are also helper, less important threads:
- For replication relays, i.e. asynchronous replication, a separate thread is created for each connected client. The thread's job is to watch the server binary log and feed it to the client, following the client's position in the log. Since each client may point at a different position in the log, and can run at different speed, this is done in a dedicated thread per client.
- There is a thread pool for ad-hoc asynchronous tasks, such as DNS resolver or fsync(), see etp_init()
- There are openmp internal threads which we use for parallel sorting. See #pragma openmp
Any two threads communicate using the internal cbus API. A cbus consists of 2 one-way cpipes. Two threads can create and send cmsg to each other. Various types of cmsg have been defined such as wal_request, iproto_msg, iproto_set_listen_msg.
The iproto creates the net_tx_bus for communication with the main transaction thread. The WAL thread creates the the writer->tx_wal_bus for communication with the main transaction thread.
The cpipe mechanism is internally implemented as a 3-stage pipeline to reduce lock contention. The message transmitter inserts a message into the first queue, which is solely owned by it and hence requires no locking. This first queue is periodically moved to second queue. The second stage is protected by a mutex. When the message receiver thread is ready, it moves the messages from the second stage to the third stage. While this delayed insert of a message into the final queue increases the latency on individual message processing, it also increases throughput.
LIGHT (light.h) stands for Linear Hash Table. It uses an extensible hashing scheme using a variable bitmask over the hash value to amortize the cost of index growth during tuple inserts.
BPS (b+) stands for in-memory b+ tree.
Each index structure has a struct matras allocator as a data member to allocate index entries. Matras IDs that are returned by the allocator are 32-bit. Hence each index can contain maximum 2 billion entries.
An index which wants to point the Matras ID to a new location will call matras_touch() (see calls in bps_tree_touch_block and light.h).
An index iterator which wants to create a snapshot will call matras_create_read_view(). This function is not called for secondary keys, since they don't need view stability.
Class space (represents a user or system Table)
{
Handler (is a pointer to Engine to be used for table operations)
tuple_format (used in this space)
Index list : (dense and sparse)
space_def
etc
}
Class Index : Derived classes are
- MemtxIndex : further derived are MemtxTree, MemtxHash, Btree, Bitset
- SophiaIndex : further derived indices exist. See code.
- SysviewIndex : further derived indices exist. See code.
Class Engine : Derived classes are
- MemtxEngine :
- SophiaEngine :
- SysviewEngine. This is used for schema operations.
Class Handler: It is a handle to the Engine for doing insert/update/delete operations. Derived classes are MemtxSpace, SophiaSpace, SysviewSpace.
Class txn
{
txn_stmt (list of statements in current transaction)
Engine pointer
}
lbox_insert (lua function)
box_insert/box_replace/box_update()
box_process1()
process_rw()
{
lookup the schema (space)
begin transaction
space->handler->executeReplace/Update/Delete().
commit transaction using txn_commit_stmt
}
Each engine implements the executeReplace/Update/Delete methods seen in the pseudo-code above. For example, this is what Memtx update does
MemtxSpace::executeUpdate()
{
tuple_update
tuple_update_execute: this creates a “rope” (a short-lived tree which orders field updates before replacing the tuple)
tuple_bless
}
And this is how upsert works
MemtxSpace::executeUpsert()
{
tuple_upsert
tuple_upsert_execute
tuple_bless
}
Architecture Specifications
- Server architecture
- Feature specifications
- What's in a good specification
- Functional indexes
- Space _index structure
- R tree index quick start and usage
- LuaJIT
- Vinyl
- SQL
- Testing
- Performance
How To ...?
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug Lua state with GDB
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
Lua modules
Useful links