Optimize `copy_cells`, part 2 #1695

joe-maley · 2020-06-22T15:08:36Z

This patch optimizes the copy_cells path for multi-fragment reads. The
following benchmarks are for the multi-fragment read scenario discussed
offline:

// Current
Read time: 4.75082 secs
  * Time to copy result attribute values: 3.54192 secs
    > Time to read attribute tiles: 0.311707 secs
    > Time to unfilter attribute tiles: 0.370434 secs
    > Time to copy fixed-sized attribute values: 0.898421 secs
    > Time to copy var-sized attribute values: 0.954925 secs

// With this patch
Read time: 3.04627 secs
  * Time to copy result attribute values: 1.83972 secs
    > Time to read attribute tiles: 0.274928 secs
    > Time to unfilter attribute tiles: 0.38196 secs
    > Time to copy fixed-sized attribute values: 0.517415 secs
    > Time to copy var-sized attribute values: 0.461847 secs

For context, here are the benchmark results for the single-fragment read. The
stats are similar with and without this patch:

Read time: 1.86883 secs
  * Time to copy result attribute values: 1.19411 secs
    > Time to read attribute tiles: 0.304055 secs
    > Time to unfilter attribute tiles: 0.351332 secs
    > Time to copy fixed-sized attribute values: 0.289661 secs
    > Time to copy var-sized attribute values: 0.142405 secs

This patch does three things:

Converts the offset_offsets_per_cs and var_offsets_per_cs in the var-sized
path from a 2D array (vector<vector<uint64_t>>) to a 1D array (vector<uint64_t).
The big win is construction and destruction time for the nested vector
elements.
Partitions cell copying for both fixed and var-sized paths. The motivation
is to reduce contention on the TBB threads and minimize time spent context
switching between the individual cell slab copies.
Added a context cache for the fixed size path, similar to the existing
var-sized context cache.

This patch optimizes the `copy_cells` path for multi-fragment reads. The following benchmarks are for the multi-fragment read scenario discussed offline: ``` // Current Read time: 4.75082 secs * Time to copy result attribute values: 3.54192 secs > Time to read attribute tiles: 0.311707 secs > Time to unfilter attribute tiles: 0.370434 secs > Time to copy fixed-sized attribute values: 0.898421 secs > Time to copy var-sized attribute values: 0.954925 secs ``` ``` // With this patch Read time: 3.04627 secs * Time to copy result attribute values: 1.83972 secs > Time to read attribute tiles: 0.274928 secs > Time to unfilter attribute tiles: 0.38196 secs > Time to copy fixed-sized attribute values: 0.517415 secs > Time to copy var-sized attribute values: 0.461847 secs ``` For context, here are the benchmark results for the single-fragment read. The stats are similar with and without this patch: ``` Read time: 1.86883 secs * Time to copy result attribute values: 1.19411 secs > Time to read attribute tiles: 0.304055 secs > Time to unfilter attribute tiles: 0.351332 secs > Time to copy fixed-sized attribute values: 0.289661 secs > Time to copy var-sized attribute values: 0.142405 secs ``` This patch does three things: 1. Converts the `offset_offsets_per_cs` and `var_offsets_per_cs` in the var-sized path from a 2D array (vector<vector<uint64_t>>) to a 1D array (vector<uint64_t). The big win is construction and destruction time for the nested vector elements. 2. Partitions cell copying for both fixed and var-sized paths. The motivation is to reduce contention on the TBB threads and minimize time spent context switching between the individual cell slab copies. 3. Added a context cache for the fixed size path, similar to the existing var-sized context cache.

This patch optimizes the `copy_cells` path for multi-fragment reads. The following benchmarks are for the multi-fragment read scenario discussed offline: ``` // Current Read time: 4.75082 secs * Time to copy result attribute values: 3.54192 secs > Time to read attribute tiles: 0.311707 secs > Time to unfilter attribute tiles: 0.370434 secs > Time to copy fixed-sized attribute values: 0.898421 secs > Time to copy var-sized attribute values: 0.954925 secs ``` ``` // With this patch Read time: 3.04627 secs * Time to copy result attribute values: 1.83972 secs > Time to read attribute tiles: 0.274928 secs > Time to unfilter attribute tiles: 0.38196 secs > Time to copy fixed-sized attribute values: 0.517415 secs > Time to copy var-sized attribute values: 0.461847 secs ``` For context, here are the benchmark results for the single-fragment read. The stats are similar with and without this patch: ``` Read time: 1.86883 secs * Time to copy result attribute values: 1.19411 secs > Time to read attribute tiles: 0.304055 secs > Time to unfilter attribute tiles: 0.351332 secs > Time to copy fixed-sized attribute values: 0.289661 secs > Time to copy var-sized attribute values: 0.142405 secs ``` This patch does three things: 1. Converts the `offset_offsets_per_cs` and `var_offsets_per_cs` in the var-sized path from a 2D array (vector<vector<uint64_t>>) to a 1D array (vector<uint64_t). The big win is construction and destruction time for the nested vector elements. 2. Partitions cell copying for both fixed and var-sized paths. The motivation is to reduce contention on the TBB threads and minimize time spent context switching between the individual cell slab copies. 3. Added a context cache for the fixed size path, similar to the existing var-sized context cache. Co-authored-by: Joe Maley <[email protected]>

joe-maley requested a review from stavrospapadopoulos June 22, 2020 15:08

joe-maley force-pushed the jpm/copy-cells-perf-2 branch 2 times, most recently from 1395b81 to 691d1c8 Compare June 22, 2020 18:45

stavrospapadopoulos approved these changes Jun 22, 2020

View reviewed changes

joe-maley force-pushed the jpm/copy-cells-perf-2 branch from 691d1c8 to e563a5d Compare June 23, 2020 11:11

joe-maley force-pushed the jpm/copy-cells-perf-2 branch from e563a5d to 7b76473 Compare June 23, 2020 13:18

Merge branch 'dev' into jpm/copy-cells-perf-2

4b70ecc

joe-maley merged commit 1597833 into dev Jun 23, 2020

joe-maley deleted the jpm/copy-cells-perf-2 branch June 23, 2020 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `copy_cells`, part 2 #1695

Optimize `copy_cells`, part 2 #1695

joe-maley commented Jun 22, 2020

Optimize copy_cells, part 2 #1695

Optimize copy_cells, part 2 #1695

Conversation

joe-maley commented Jun 22, 2020

Optimize `copy_cells`, part 2 #1695

Optimize `copy_cells`, part 2 #1695