Skip to content

performance research

Alexey Khatskevich edited this page May 23, 2018 · 3 revisions

Motivation

Tarantool (memtex) is an in-memory engine and It doesn't waste time to locks and synchronizations. Moreover, Tarantool can avoid writes to a disk.

Tarantools advantages:

  • no locks
  • no random disk writes

Tarantools disadvantages:

  • single threaded
  • any iteration over index is random reads from RAM (tuples are allocated in slabs)

There is an assumption that this architecture should help beat ordinary SQL databases.

Purposes of research

  • find performance bottlenecks
  • compare the speed of Tarantool SQL with other DBs on different workloads
  • beat other DBs

Plan

  • enable LTO (plan described below) (it can disclose a real bottle-neck)
  • btree bench (plan described below)
  • sql - engine bench
    • compare speed to C implementation of the same query
      • write tests
        • huge select (~20% slowdoun)
        • small select (~50% slowdoun)
      • process results and put it here
    • investigate existing benchmarks
      • check if bench is written correc for tarantool
  • io bench
    • select over net.box
    • research fsync influence
      • workload with lots of fibers
      • workload with a single fiber
    • nginx butching bench (possibly main perf improvement)
    • user-space networking
  • real-world bench
    • steal a workload from production and analyze patterns and bottlenecks

Btree bench plan

  • randomization influence
    • absolute random (main consumer - cache misses)
    • sequential (main consmer - msgpuck/btree call stack?)
      • compare to hash index
    • mixed
      • find main consumer changed points
  • workload influence
    • lots of small requests (part of randomization influence)
    • select join
      • huge join
      • join + index
        • w/o index
        • with index
  • hint patch investigation (store data for comparition straight in btree)
    • btree research
      • block size, block traversal (current 512b + lineral traversal for hints)
        • binary search is faster? No
        • other block size for lineral travversal is faster? No
  • prefetch operation influence (async work with memory)
    • write asyny-memory hello_rowld.c test

Enable LTO plan

Tarantool fails to start when compiled with LTO because linker ignores -Wl,--dynamic-list,${exports_file} option

  • create small prog to repeat exports problem
    • see that LTO works
    • see that on LTO enable exports disappear
    • try to build with the gold linker (the same result)
    • build with -rdynamic (exports preserved, binary changed)
  • compile with clang (the same result)
  • build with lto and -rdynamic (speed decreased?)
  • check __attribute__((used)) (dynamic-list start working!) -> problem possibly in ld
  • Write minimal problem reproducer, which shows the export-lto-used problem
  • Fix cmake to build with lto in case of new binutils
  • Fix lto for mac
  • Find out why gcc slows down?
    • O2 -> O3

Known problems

  • sql
    • lots of memcpy and malloc in SQL engine
      • ephemeral tables work inefficiently
      • extra memcpy's on inserts to any table
    • the absence of prepared statements (query recompiling on each request)
    • the same core for query compilation and execution
  • architecture
    • single threaded engine
    • single-threaded scan
    • sequential scan -> random reads (cache locality is not used)
      • minor not covering b-tree indexes (random reads on any comparison)
  • slow tuple compare
    • huge stack of calls
    • slow msgpuck

Related links

Clone this wiki locally