Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize copy_cells, part 2 #1695

Merged
merged 2 commits into from
Jun 23, 2020
Merged

Optimize copy_cells, part 2 #1695

merged 2 commits into from
Jun 23, 2020

Conversation

joe-maley
Copy link
Contributor

This patch optimizes the copy_cells path for multi-fragment reads. The
following benchmarks are for the multi-fragment read scenario discussed
offline:

// Current
Read time: 4.75082 secs
  * Time to copy result attribute values: 3.54192 secs
    > Time to read attribute tiles: 0.311707 secs
    > Time to unfilter attribute tiles: 0.370434 secs
    > Time to copy fixed-sized attribute values: 0.898421 secs
    > Time to copy var-sized attribute values: 0.954925 secs
// With this patch
Read time: 3.04627 secs
  * Time to copy result attribute values: 1.83972 secs
    > Time to read attribute tiles: 0.274928 secs
    > Time to unfilter attribute tiles: 0.38196 secs
    > Time to copy fixed-sized attribute values: 0.517415 secs
    > Time to copy var-sized attribute values: 0.461847 secs

For context, here are the benchmark results for the single-fragment read. The
stats are similar with and without this patch:

Read time: 1.86883 secs
  * Time to copy result attribute values: 1.19411 secs
    > Time to read attribute tiles: 0.304055 secs
    > Time to unfilter attribute tiles: 0.351332 secs
    > Time to copy fixed-sized attribute values: 0.289661 secs
    > Time to copy var-sized attribute values: 0.142405 secs

This patch does three things:

  1. Converts the offset_offsets_per_cs and var_offsets_per_cs in the var-sized
    path from a 2D array (vector<vector<uint64_t>>) to a 1D array (vector<uint64_t).
    The big win is construction and destruction time for the nested vector
    elements.
  2. Partitions cell copying for both fixed and var-sized paths. The motivation
    is to reduce contention on the TBB threads and minimize time spent context
    switching between the individual cell slab copies.
  3. Added a context cache for the fixed size path, similar to the existing
    var-sized context cache.

@joe-maley joe-maley force-pushed the jpm/copy-cells-perf-2 branch 2 times, most recently from 1395b81 to 691d1c8 Compare June 22, 2020 18:45
@joe-maley joe-maley force-pushed the jpm/copy-cells-perf-2 branch from 691d1c8 to e563a5d Compare June 23, 2020 11:11
This patch optimizes the `copy_cells` path for multi-fragment reads. The
following benchmarks are for the multi-fragment read scenario discussed
offline:

```
// Current
Read time: 4.75082 secs
  * Time to copy result attribute values: 3.54192 secs
    > Time to read attribute tiles: 0.311707 secs
    > Time to unfilter attribute tiles: 0.370434 secs
    > Time to copy fixed-sized attribute values: 0.898421 secs
    > Time to copy var-sized attribute values: 0.954925 secs
```

```
// With this patch
Read time: 3.04627 secs
  * Time to copy result attribute values: 1.83972 secs
    > Time to read attribute tiles: 0.274928 secs
    > Time to unfilter attribute tiles: 0.38196 secs
    > Time to copy fixed-sized attribute values: 0.517415 secs
    > Time to copy var-sized attribute values: 0.461847 secs
```

For context, here are the benchmark results for the single-fragment read. The
stats are similar with and without this patch:
```
Read time: 1.86883 secs
  * Time to copy result attribute values: 1.19411 secs
    > Time to read attribute tiles: 0.304055 secs
    > Time to unfilter attribute tiles: 0.351332 secs
    > Time to copy fixed-sized attribute values: 0.289661 secs
    > Time to copy var-sized attribute values: 0.142405 secs
```

This patch does three things:
1. Converts the `offset_offsets_per_cs` and `var_offsets_per_cs` in the var-sized
   path from a 2D array (vector<vector<uint64_t>>) to a 1D array (vector<uint64_t).
   The big win is construction and destruction time for the nested vector
   elements.
2. Partitions cell copying for both fixed and var-sized paths. The motivation
   is to reduce contention on the TBB threads and minimize time spent context
   switching between the individual cell slab copies.
3. Added a context cache for the fixed size path, similar to the existing
   var-sized context cache.
@joe-maley joe-maley force-pushed the jpm/copy-cells-perf-2 branch from e563a5d to 7b76473 Compare June 23, 2020 13:18
@joe-maley joe-maley merged commit 1597833 into dev Jun 23, 2020
@joe-maley joe-maley deleted the jpm/copy-cells-perf-2 branch June 23, 2020 19:02
joe-maley pushed a commit that referenced this pull request Jul 22, 2020
This patch optimizes the `copy_cells` path for multi-fragment reads. The
following benchmarks are for the multi-fragment read scenario discussed
offline:

```
// Current
Read time: 4.75082 secs
  * Time to copy result attribute values: 3.54192 secs
    > Time to read attribute tiles: 0.311707 secs
    > Time to unfilter attribute tiles: 0.370434 secs
    > Time to copy fixed-sized attribute values: 0.898421 secs
    > Time to copy var-sized attribute values: 0.954925 secs
```

```
// With this patch
Read time: 3.04627 secs
  * Time to copy result attribute values: 1.83972 secs
    > Time to read attribute tiles: 0.274928 secs
    > Time to unfilter attribute tiles: 0.38196 secs
    > Time to copy fixed-sized attribute values: 0.517415 secs
    > Time to copy var-sized attribute values: 0.461847 secs
```

For context, here are the benchmark results for the single-fragment read. The
stats are similar with and without this patch:
```
Read time: 1.86883 secs
  * Time to copy result attribute values: 1.19411 secs
    > Time to read attribute tiles: 0.304055 secs
    > Time to unfilter attribute tiles: 0.351332 secs
    > Time to copy fixed-sized attribute values: 0.289661 secs
    > Time to copy var-sized attribute values: 0.142405 secs
```

This patch does three things:
1. Converts the `offset_offsets_per_cs` and `var_offsets_per_cs` in the var-sized
   path from a 2D array (vector<vector<uint64_t>>) to a 1D array (vector<uint64_t).
   The big win is construction and destruction time for the nested vector
   elements.
2. Partitions cell copying for both fixed and var-sized paths. The motivation
   is to reduce contention on the TBB threads and minimize time spent context
   switching between the individual cell slab copies.
3. Added a context cache for the fixed size path, similar to the existing
   var-sized context cache.

Co-authored-by: Joe Maley <[email protected]>
Shelnutt2 pushed a commit that referenced this pull request Jul 22, 2020
This patch optimizes the `copy_cells` path for multi-fragment reads. The
following benchmarks are for the multi-fragment read scenario discussed
offline:

```
// Current
Read time: 4.75082 secs
  * Time to copy result attribute values: 3.54192 secs
    > Time to read attribute tiles: 0.311707 secs
    > Time to unfilter attribute tiles: 0.370434 secs
    > Time to copy fixed-sized attribute values: 0.898421 secs
    > Time to copy var-sized attribute values: 0.954925 secs
```

```
// With this patch
Read time: 3.04627 secs
  * Time to copy result attribute values: 1.83972 secs
    > Time to read attribute tiles: 0.274928 secs
    > Time to unfilter attribute tiles: 0.38196 secs
    > Time to copy fixed-sized attribute values: 0.517415 secs
    > Time to copy var-sized attribute values: 0.461847 secs
```

For context, here are the benchmark results for the single-fragment read. The
stats are similar with and without this patch:
```
Read time: 1.86883 secs
  * Time to copy result attribute values: 1.19411 secs
    > Time to read attribute tiles: 0.304055 secs
    > Time to unfilter attribute tiles: 0.351332 secs
    > Time to copy fixed-sized attribute values: 0.289661 secs
    > Time to copy var-sized attribute values: 0.142405 secs
```

This patch does three things:
1. Converts the `offset_offsets_per_cs` and `var_offsets_per_cs` in the var-sized
   path from a 2D array (vector<vector<uint64_t>>) to a 1D array (vector<uint64_t).
   The big win is construction and destruction time for the nested vector
   elements.
2. Partitions cell copying for both fixed and var-sized paths. The motivation
   is to reduce contention on the TBB threads and minimize time spent context
   switching between the individual cell slab copies.
3. Added a context cache for the fixed size path, similar to the existing
   var-sized context cache.

Co-authored-by: Joe Maley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants