-
Notifications
You must be signed in to change notification settings - Fork 903
Data Structures
To support the transport of messages from senders to receivers a number of data structures are required. These data structures need to conform to a number of design principles for the common transport path, exceptional cases can be afforded an exceptional path, e.g. large messages:
- Copy-Free: The send and receive buffer is directly used.
- Allocation-Free: The send and receive buffer is pre-allocated so that the transport path is free from the allocation or reclamation of memory.
- Lock-Free: Lock-free techniques are employed to manage concurrent access by multiple producers.
- Wait-Free: Concurrent algorithms will complete within a finite number of steps by multiple producers.
- Persistent/Immutable: Message buffers for send and receive record immutable history to allow replay and resend semantics.
- O(1) Cost: All operations are constant time regardless of buffer length, senders, receivers, contention, etc.
The sender needs to be able to support the efficient retransmission of messages. Messages may need to be retransmitted to cope with loss or late joiners to a communications stream. Each log partition represents a term in the communication history from the source.
Log Buffer - Memory Mapped File +----------------------------+ | Term 0 | +----------------------------+ | Term 1 | +----------------------------+ | Term 2 | +----------------------------+ | Term Meta Data 0 | +----------------------------+ | Term Meta Data 1 | +----------------------------+ | Term Meta Data 2 | +----------------------------+ | Log Meta Data | +----------------------------+
Messages are stored to the buffer in preparation for transmission as a sequential stream with FIFO semantics. The next n bytes of the buffer are claimed by performing an atomic increment of the tail counter stored in the state trailer. The algorithm is designed to take advantage of LOCK XADD on x86 to avoid spinning CAS (LOCK CMPXCHG) loops.
The producers of message call send(bytes)
on a source object passing the bytes to be sent. The send call then follows the following algorithm:
- IF message is bigger than transmission frame length THEN
- Send in chunks
- END IF
- WHILE next capacity claim < message length
- Mark claimed capacity with padding
- Move on to next term
- END WHILE
- Copy in message
- Mark record in log as complete
A sender thread observes the sender archive and repeatedly performs the following algorithm:
- Scan forward for new frames
- Send message on underlying network layer
The state trailer contains the variables describing the current state of the archive for the communications term. Variables are stored with consideration for false sharing issues when concurrent access is required.
- tail: buffer offset at which the next message can be added. An atomic get-and-increment operation is used to advance the tail for claiming capacity in the buffer.
- status: Used to indicate if the term buffer needs cleaning before reuse.
The receiver log buffer is a mirror of the sender log buffer with reciprocal semantics for assembling a sequence of messages from a sender over an unreliable network layer. The receiver is single threaded and thus does not need to maintain the state trailer therefore those fields are all zero.
A receiver thread receives messages from the network layer into the receiver archive.