Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ihn/test mingw #1

Closed
wants to merge 7 commits into from
Closed

Ihn/test mingw #1

wants to merge 7 commits into from

Conversation

ihnorton
Copy link
Owner

No description provided.

ihnorton pushed a commit that referenced this pull request Sep 2, 2020
This fixes the following within StorageManager::load_array_metadata():
1. A direct memory leak of a `Buffer` object introduced with the chunked buffers
  patch.
2. An long-standing indirect memory leak of the metadata buffers.

For #1, an auxiliary `Buffer` object is created to store the contents of a
read from a chunked buffer. The `Buffer` object is never deleted.

For #2, the raw buffers inside of the `Buffer` object are passed into a
`ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them.
The buffers are never freed while the `ConstBuffer` references them because of
leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer`
objects so that the `OpenArray` instance can take ownership of the metadata
buffers and free them appropriately.

Note that before the chunked buffer patch, we had a similar pattern where the
`Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be
affecting older core builds.
ihnorton pushed a commit that referenced this pull request Sep 21, 2020
Currently, reading/unfiltering/copying tiles are performed serially among
attribute names within Reader::copy_attribute_values():
```
for (const auto& name : names) {
  read_tiles(name, result_tiles);
  unfilter_tiles(name, result_tiles, &cs_ranges);
  if (!array_schema_->var_size(name))
    copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache);
  else
    copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache);
  clear_tiles(name, result_tiles);
}
```

Parallelizing the entire for-loop has a few problems:
1. Each iteration requires a large amount of temporary memory to buffer reads
   into before they are copied into `buffers_` and cleared within clear_tiles().
   This memory can not be shared among parallel iterations, so memory bloat
   scales linearly with the number of parallel iterations.
2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and
   `clear_tiles` are thread-unsafe because of unprotected access to the
    individual `ResultTile` elements within `result_tiles`.
3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the
   `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe.

Problem #1: This patch restricts the number of parallel iterations to 2. This
is a hard-coded value that I will justify later in this commit message.

Problem #2: We can achieve thread-safety with a mutex for each ResultTile
element. Even if we tolerate the memory overhead associated with allocating a
mutex for each ResultTile, the CPU time spent locking at this granularity
is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths.
However, it is experimentally negligible for the other paths. This is enough
of a penalty that I have decided to use a single lock to protect all elements.
This effectively serializes all routines, except for the IO tasks within
`read_tiles`.

Problem #3: I have made the context caches thread safe, although they are still
accessed in serial. This is mostly because I had to write this code before I
was able to test that parallelizing `copy_*_cells` was too expensive with
per-ResulTile mutexes. I have decided to leave it in this patch because it is
zero-harm and required for future improvements.

I have been testing this in an IO-bound workload (using an S3 backend). Due to
the large variance in S3 request times, I ran my test many times and collected
the best-case numbers.

// Execution times:
Without this patch: 4.5s
With this patch, 1 parallel iteration: 4.5s
With this patch, 2 parallel iteration: 3.4s
With this patch, 3 parallel iteration: 3.4s
With this patch, 4 parallel iteration: 3.2s
With this patch, 6 parallel iteration: 3.2s
With this patch, 8 parallel iteration: 3.3s
With this patch, 12 parallel iteration: 3.7s

Given the above, it does seem that the number of parallel iterations should
be configurable. For more than 2 parallel iterations, we linearly scale memory
bloat with little-to-gain in execution time. However, with just 2 parallel
iterations, we receive a 32% speed-up.

Future Improvments:
- Modify ThreadPool::execute() to allow limiting the number of parallel iterations
for an arbitrary number of tasks. This patch uses a step-wise execution of
parallel reads. In other words: both parallel iterations must complete before
the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should
start immediately after one of the first two complete.

- Increase the granularity of protection on ResultTiles. I have a couple ideas
but they would be large and out of scope for this patch.

- Revisit the read path for dimensions. Right now, there is no limit to the
number of dimensions read at one time. If we have a large number of dimensions,
we will have a memory bottleneck.

Co-authored-by: Joe Maley <[email protected]>
ihnorton pushed a commit that referenced this pull request Dec 11, 2020
The offsets in each buffer correspond to the values in its
data buffer. To build a single contigious buffer, we must
ensure the offset values continue in ascending order from the
previous buffer. For example, consider the following example
storing int32 values.

Buffer #1:
  offsets: [0, 8, 16]
  values:  [1, 2, 3, 4, 5, 6, 7]

Buffer #2:
  offsets: [0, 12]
  values:  [100, 200, 300, 400, 500]

The final, contigious buffer will be:
  offsets: [0, 8, 16, 28, 40]
  values:  [1, 2, 3, 4, 5, 6, 7, 100, 200, 300, 400, 500]

The last two offsets, `28, 40` were calculated by adding `28`
to offsets of Buffer #2: [0, 12]. The `28` was calculated as
the byte size of the values in Buffer #1.

Co-authored-by: Joe Maley <[email protected]>
ihnorton pushed a commit that referenced this pull request Jun 9, 2021
Fixes TileDB-Inc#2201

This patch introduces a global lock on the `S3Client` constructor to prevent
the following race reported from TSAN:
```
WARNING: ThreadSanitizer: data race (pid=12)
  Write of size 8 at 0x7fe93658a1b0 by thread T16 (mutexes: write M144250874682678808, write M150443169551677688):
    #0 cJSON_ParseWithOpts external/aws/aws-cpp-sdk-core/source/external/cjson/cJSON.cpp:1006 (libexternal_Saws_Slibaws.so+0x299bef)
    #1 Aws::Utils::Json::JsonValue::JsonValue(std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&) external/aws/aws-cpp-sdk-core/source/utils/json/JsonSerializer.cpp:39 (libexternal_Saws_Slibaws.so+0x32d09c)
    #2 Aws::Config::EC2InstanceProfileConfigLoader::LoadInternal() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:341 (libexternal_Saws_Slibaws.so+0x286d88)
    #3 Aws::Config::AWSProfileConfigLoader::Load() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:47 (libexternal_Saws_Slibaws.so+0x282b4f)
    #4 Aws::Auth::InstanceProfileCredentialsProvider::Reload() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:265 (libexternal_Saws_Slibaws.so+0x235987)
    #5 Aws::Auth::InstanceProfileCredentialsProvider::RefreshIfExpired() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:283 (libexternal_Saws_Slibaws.so+0x23613b)
    #6 Aws::Auth::InstanceProfileCredentialsProvider::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:250 (libexternal_Saws_Slibaws.so+0x23be9a)
    TileDB-Inc#7 Aws::Auth::AWSCredentialsProviderChain::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProviderChain.cpp:35 (libexternal_Saws_Slibaws.so+0x244550)
    TileDB-Inc#8 Aws::Client::AWSAuthV4Signer::AWSAuthV4Signer(std::shared_ptr<Aws::Auth::AWSCredentialsProvider> const&, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool) external/aws/aws-cpp-sdk-core/source/auth/AWSAuthSigner.cpp:170 (libexternal_Saws_Slibaws.so+0x2262ed)
    TileDB-Inc#9 std::shared_ptr<Aws::Client::AWSAuthV4Signer> Aws::MakeShared<Aws::Client::AWSAuthV4Signer, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool>(char const*, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>&&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool&&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Saws_Slibaws.so+0x39a78b)
    TileDB-Inc#10 Aws::S3::S3Client::S3Client(Aws::Client::ClientConfiguration const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool, Aws::S3::US_EAST_1_REGIONAL_ENDPOINT_OPTION) external/aws/aws-cpp-sdk-s3/source/S3Client.cpp:134 (libexternal_Saws_Slibaws.so+0x39a78b)
    TileDB-Inc#11 std::shared_ptr<Aws::S3::S3Client>::shared_ptr<Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(std::_Sp_alloc_shared_tag<Aws::Allocator<Aws::S3::S3Client> >, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)
    TileDB-Inc#12 std::shared_ptr<Aws::S3::S3Client> std::allocate_shared<Aws::S3::S3Client, Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(Aws::Allocator<Aws::S3::S3Client> const&, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/bits/shared_ptr.h:702 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)
    TileDB-Inc#13 std::shared_ptr<Aws::S3::S3Client> Aws::MakeShared<Aws::S3::S3Client, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(char const*, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) external/aws/aws-cpp-sdk-core/include/aws/core/utils/memory/stl/AWSAllocator.h:106 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)
    TileDB-Inc#14 tiledb::sm::S3::init_client() const external/com_github_tiledb_tiledb/tiledb/sm/filesystem/s3.cc:1104 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)
```

---

TYPE: BUG
DESC: Fix race within `S3::init_client`

Co-authored-by: Joe Maley <[email protected]>
@ihnorton ihnorton closed this Jun 9, 2021
ihnorton pushed a commit that referenced this pull request Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant