-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tile id bug #1
Merged
Merged
Tile id bug #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ts (and, hence, too many space tiles that exceeded int upper bound).
joe-maley
pushed a commit
that referenced
this pull request
Mar 27, 2019
Closes #1068 1. Consolidates all state for a single upload multipart request into a single struct 'MultiPartUploadState'. 2. Refactored to support #1. 3. Introduced a shared Status that will be set to the last non-OK status from an individual part upload request. Defaults to OK. 4. When disconnecting or flushing an object, we'll invoke an 'Abort' request instead of a 'Complete' request if the shared state from #3 is non-OK. Tests run: ./tiledb_unit *S3*
joe-maley
pushed a commit
that referenced
this pull request
Mar 28, 2019
Closes #1068 1. Consolidates all state for a single upload multipart request into a single struct 'MultiPartUploadState'. 2. Refactored to support #1. 3. Introduced a shared Status that will be set to the last non-OK status from an individual part upload request. Defaults to OK. 4. When disconnecting or flushing an object, we'll invoke an 'Abort' request instead of a 'Complete' request if the shared state from #3 is non-OK. Tests run: ./tiledb_unit *S3*
joe-maley
pushed a commit
that referenced
this pull request
Mar 29, 2019
Closes #1068 1. Consolidates all state for a single upload multipart request into a single struct 'MultiPartUploadState'. 2. Refactored to support #1. 3. Introduced a shared Status that will be set to the last non-OK status from an individual part upload request. Defaults to OK. 4. When disconnecting or flushing an object, we'll invoke an 'Abort' request instead of a 'Complete' request if the shared state from #3 is non-OK. 5. Added a unit test for this failure path 6. Introduced the 'UnitTestConfig' singleton for mutating the state of classes under test. 7. Introduced 'macros.h' for common preprocessor macros. Tests run: ./tiledb_unit *S3*
joe-maley
pushed a commit
that referenced
this pull request
Mar 29, 2019
Closes #1068 1. Consolidates all state for a single upload multipart request into a single struct 'MultiPartUploadState'. 2. Refactored to support #1. 3. Introduced a shared Status that will be set to the last non-OK status from an individual part upload request. Defaults to OK. 4. When disconnecting or flushing an object, we'll invoke an 'Abort' request instead of a 'Complete' request if the shared state from #3 is non-OK. 5. Added a unit test for this failure path 6. Introduced the 'UnitTestConfig' singleton for mutating the state of classes under test. 7. Introduced 'macros.h' for common preprocessor macros. Tests run: ./tiledb_unit *S3*
joe-maley
pushed a commit
that referenced
this pull request
Mar 29, 2019
Closes #1068 1. Consolidates all state for a single upload multipart request into a single struct 'MultiPartUploadState'. 2. Refactored to support #1. 3. Introduced a shared Status that will be set to the last non-OK status from an individual part upload request. Defaults to OK. 4. When disconnecting or flushing an object, we'll invoke an 'Abort' request instead of a 'Complete' request if the shared state from #3 is non-OK. 5. Added a unit test for this failure path 6. Introduced the 'UnitTestConfig' singleton for mutating the state of classes under test. 7. Introduced 'macros.h' for common preprocessor macros. Tests run: ./tiledb_unit *S3*
joe-maley
pushed a commit
that referenced
this pull request
Mar 29, 2019
Closes #1068 1. Consolidates all state for a single upload multipart request into a single struct 'MultiPartUploadState'. 2. Refactored to support #1. 3. Introduced a shared Status that will be set to the last non-OK status from an individual part upload request. Defaults to OK. 4. When disconnecting or flushing an object, we'll invoke an 'Abort' request instead of a 'Complete' request if the shared state from #3 is non-OK. 5. Added a unit test for this failure path 6. Introduced the 'UnitTestConfig' singleton for mutating the state of classes under test. 7. Introduced 'macros.h' for common preprocessor macros. Tests run: ./tiledb_unit *S3*
joe-maley
pushed a commit
that referenced
this pull request
Mar 29, 2019
Closes #1068 1. Consolidates all state for a single upload multipart request into a single struct 'MultiPartUploadState'. 2. Refactored to support #1. 3. Introduced a shared Status that will be set to the last non-OK status from an individual part upload request. Defaults to OK. 4. When disconnecting or flushing an object, we'll invoke an 'Abort' request instead of a 'Complete' request if the shared state from #3 is non-OK. 5. Added a unit test for this failure path 6. Introduced the 'UnitTestConfig' singleton for mutating the state of classes under test. 7. Introduced 'macros.h' for common preprocessor macros. Tests run: ./tiledb_unit *S3*
Closed
joe-maley
pushed a commit
that referenced
this pull request
Nov 26, 2019
This fixes two issues: Issue #1: "Cannot deserialize; kj::Exception: expected segment != nullptr && segment->checkObject(segment->getStartPtr(), ONE * WORDS); Message did not contain a root pointer."} This a one-line fix where we must reset the original buffers offset after writing to it. Issue #2: Before attempting to deserialize a query, we serialize the current state as a backup. If the deserialization of the new query fails, we deserialize the backup to ensure that the query object leaves the 'query_deserialize()' in its original state. Deserializing the backup is failing on a 'memcpy' when we attempt to copy the first incoming attribute buffer into the backup query. This fails because we are only deserializing the query object and do not provide any concatenated data buffers. The easiest fix is to add a path to ignore attribute buffer processing entirely in the 'do_query_deserialize' path. This allows us to revert the Query object to its original state, leaving the data buffers untouched. This is OK because we also revert the 'copy_state', so we can have extra, incomplete data at the end of the data buffers.
joe-maley
pushed a commit
that referenced
this pull request
Nov 27, 2019
This fixes two issues: Issue #1: "Cannot deserialize; kj::Exception: expected segment != nullptr && segment->checkObject(segment->getStartPtr(), ONE * WORDS); Message did not contain a root pointer."} This a one-line fix where we must reset the original buffers offset after writing to it. Issue #2: Before attempting to deserialize a query, we serialize the current state as a backup. If the deserialization of the new query fails, we deserialize the backup to ensure that the query object leaves the 'query_deserialize()' in its original state. Deserializing the backup is failing on a 'memcpy' when we attempt to copy the first incoming attribute buffer into the backup query. This fails because we are only deserializing the query object and do not provide any concatenated data buffers. The easiest fix is to add a path to ignore attribute buffer processing entirely in the 'do_query_deserialize' path. This allows us to revert the Query object to its original state, leaving the data buffers untouched. This is OK because we also revert the 'copy_state', so we can have extra, incomplete data at the end of the data buffers.
joe-maley
pushed a commit
that referenced
this pull request
Nov 27, 2019
This fixes two issues: Issue #1: "Cannot deserialize; kj::Exception: expected segment != nullptr && segment->checkObject(segment->getStartPtr(), ONE * WORDS); Message did not contain a root pointer."} This a one-line fix where we must reset the original buffers offset after writing to it. Issue #2: Before attempting to deserialize a query, we serialize the current state as a backup. If the deserialization of the new query fails, we deserialize the backup to ensure that the query object leaves the 'query_deserialize()' in its original state. Deserializing the backup is failing on a 'memcpy' when we attempt to copy the first incoming attribute buffer into the backup query. This fails because we are only deserializing the query object and do not provide any concatenated data buffers. The easiest fix is to add a path to ignore attribute buffer processing entirely in the 'do_query_deserialize' path. This allows us to revert the Query object to its original state, leaving the data buffers untouched. This is OK because we also revert the 'copy_state', so we can have extra, incomplete data at the end of the data buffers.
joe-maley
pushed a commit
that referenced
this pull request
Dec 2, 2019
This fixes two issues: Issue #1: "Cannot deserialize; kj::Exception: expected segment != nullptr && segment->checkObject(segment->getStartPtr(), ONE * WORDS); Message did not contain a root pointer."} This a one-line fix where we must reset the original buffers offset after writing to it. Issue #2: Before attempting to deserialize a query, we serialize the current state as a backup. If the deserialization of the new query fails, we deserialize the backup to ensure that the query object leaves the 'query_deserialize()' in its original state. Deserializing the backup is failing on a 'memcpy' when we attempt to copy the first incoming attribute buffer into the backup query. This fails because we are only deserializing the query object and do not provide any concatenated data buffers. The easiest fix is to add a path to ignore attribute buffer processing entirely in the 'do_query_deserialize' path. This allows us to revert the Query object to its original state, leaving the data buffers untouched. This is OK because we also revert the 'copy_state', so we can have extra, incomplete data at the end of the data buffers.
ihnorton
pushed a commit
that referenced
this pull request
Dec 4, 2019
This fixes two issues: Issue #1: "Cannot deserialize; kj::Exception: expected segment != nullptr && segment->checkObject(segment->getStartPtr(), ONE * WORDS); Message did not contain a root pointer."} This a one-line fix where we must reset the original buffers offset after writing to it. Issue #2: Before attempting to deserialize a query, we serialize the current state as a backup. If the deserialization of the new query fails, we deserialize the backup to ensure that the query object leaves the 'query_deserialize()' in its original state. Deserializing the backup is failing on a 'memcpy' when we attempt to copy the first incoming attribute buffer into the backup query. This fails because we are only deserializing the query object and do not provide any concatenated data buffers. The easiest fix is to add a path to ignore attribute buffer processing entirely in the 'do_query_deserialize' path. This allows us to revert the Query object to its original state, leaving the data buffers untouched. This is OK because we also revert the 'copy_state', so we can have extra, incomplete data at the end of the data buffers. (cherry picked from commit 8fe1427 #1444)
joe-maley
pushed a commit
that referenced
this pull request
Dec 20, 2019
For early review: ChunkBuffers and associated unit tests. This patch contains chunk_buffers.cc/h for use in selective decompression. This also contains a passing unit-ChunkBuffers.cc test and a modified unit-Tile.cc test that passes with the associated changes to Tile that are NOT included in this review. I have included a detailed explanation of the new class in its header file (chunk_buffers.h), but for easy reference, I'll paste it here: * The ChunkBuffers class is used to represent a logically contigious buffer * as a vector of individual buffers. These individual buffers are referred to * as "chunk buffers". Each chunk buffer may be allocated individually, which * will save memory in scenarios where the logically contigious buffer is * sparsley allocated. * * After construction, the class must be initialized before performing IO. The * initialization determines the following, independent usage paradigms: * * #1: Chunk Sizes: Fixed/Variable * The chunk sizes must be either fixed or variable. An instance with fixed * chunk sizes ensures that all chunk buffers are of equal size. The size of the * last chunk buffer may be equal-to or less-than the other chunk sizes. * Instances with fixed size chunks have a smaller memory footprint and have a * smaller algorithmic complexity when performing IO. For variable sized chunks, * each chunk size is independent from the others. * * #2: Chunk Buffer Addressing: Discrete/Contigious * The addresses of the individual chunk buffers may or may not be virtually * contigious. For example, the chunk addresses within a virtually contigious * instance may be allocated at address 1024 and 1028, where the first chunk is * of size 4. Non-contigious chunks (referred to as "discrete") may be allocated * at any address. The trade-off is that the memory of each discrete chunk is * managed individually, where contigious chunk buffers can be managed by the * first chunk alone. * * #3: Memory Management: Internal/External * The chunk buffers may be allocated and freed internally or externally. * Internal memory management is exposed through the alloc_*() and free_*() * routines. External memory management is exposed through the set_*() routines. * Currently, this only supports external memory management for contigiously * addressed buffers and internal memory management for discretely addressed * buffers. * * Note that the ChunkBuffers class does NOT support any concept of ownership. * It is up to the caller to free the instance before destruction.
Closed
ihnorton
referenced
this pull request
in ihnorton/TileDB
Mar 17, 2020
Initialize VFS with context config
joe-maley
pushed a commit
that referenced
this pull request
Mar 17, 2020
Change #1: Fix non-multipart uploads This fixes an issue where we free the buffer before writing it in the non-multipart upload path, which can be used for quicker small writes. Adds an associated unit test. Change #2: Missing error return path in blob flush In the blob flush path for multipart uploads, the last flush of the write cache may silently fail. This ensures the user receives a non-OK status from flush_blob() in this scenario. Change #3: Multipart uploads fail when they have > 10 chunks In a the multipart upload (aka blocklist upload), each chunk is uploaded a block with a string-id unique to the overall object (aka blob). There is an undocumented requirement that all string-ids must be of equal length. Currently, the string-ids are generated from an incrementing integer. For example, the id of the first chunk is "0", the second chunk is "1", the tenth chunk is "9", and the eleventh chunk is "10". The eleventh chunk has a string-id size of 2 while all the others have a size of 1. The eleventh chunk (and all other future chunks) fail with the following error message: "The specified blob or block content is invalid" This patches pads the string-ids to ensure they are all 5-characters long. The maximum number of chunks is 50,000 so 5-characters is sufficient to contain all chunk ids. More info here: https://gauravmantri.com/2013/05/18/windows-azure-blob-storage-dealing-with-the-specified-blob-or-block-content-is-invalid-error/
Merged
joe-maley
pushed a commit
that referenced
this pull request
Mar 17, 2020
Change #1: Fix non-multipart uploads This fixes an issue where we free the buffer before writing it in the non-multipart upload path, which can be used for quicker small writes. Adds an associated unit test. Change #2: Missing error return path in blob flush In the blob flush path for multipart uploads, the last flush of the write cache may silently fail. This ensures the user receives a non-OK status from flush_blob() in this scenario. Change #3: Multipart uploads fail when they have > 10 chunks In a the multipart upload (aka blocklist upload), each chunk is uploaded a block with a string-id unique to the overall object (aka blob). There is an undocumented requirement that all string-ids must be of equal length. Currently, the string-ids are generated from an incrementing integer. For example, the id of the first chunk is "0", the second chunk is "1", the tenth chunk is "9", and the eleventh chunk is "10". The eleventh chunk has a string-id size of 2 while all the others have a size of 1. The eleventh chunk (and all other future chunks) fail with the following error message: "The specified blob or block content is invalid" This patches pads the string-ids to ensure they are all 5-characters long. The maximum number of chunks is 50,000 so 5-characters is sufficient to contain all chunk ids. More info here: https://gauravmantri.com/2013/05/18/windows-azure-blob-storage-dealing-with-the-specified-blob-or-block-content-is-invalid-error/
joe-maley
pushed a commit
that referenced
this pull request
Mar 17, 2020
Change #1: Fix non-multipart uploads This fixes an issue where we free the buffer before writing it in the non-multipart upload path, which can be used for quicker small writes. Adds an associated unit test. Change #2: Missing error return path in blob flush In the blob flush path for multipart uploads, the last flush of the write cache may silently fail. This ensures the user receives a non-OK status from flush_blob() in this scenario. Change #3: Multipart uploads fail when they have > 10 chunks In a the multipart upload (aka blocklist upload), each chunk is uploaded a block with a string-id unique to the overall object (aka blob). There is an undocumented requirement that all string-ids must be of equal length. Currently, the string-ids are generated from an incrementing integer. For example, the id of the first chunk is "0", the second chunk is "1", the tenth chunk is "9", and the eleventh chunk is "10". The eleventh chunk has a string-id size of 2 while all the others have a size of 1. The eleventh chunk (and all other future chunks) fail with the following error message: "The specified blob or block content is invalid" This patches pads the string-ids to ensure they are all 5-characters long. The maximum number of chunks is 50,000 so 5-characters is sufficient to contain all chunk ids. More info here: https://gauravmantri.com/2013/05/18/windows-azure-blob-storage-dealing-with-the-specified-blob-or-block-content-is-invalid-error/
joe-maley
pushed a commit
that referenced
this pull request
Jun 16, 2020
This fixes the following within StorageManager::load_array_metadata(): 1. A direct memory leak of a `Buffer` object introduced with the chunked buffers patch. 2. An long-standing indirect memory leak of the metadata buffers. For #1, an auxiliary `Buffer` object is created to store the contents of a read from a chunked buffer. The `Buffer` object is never deleted. For #2, the raw buffers inside of the `Buffer` object are passed into a `ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them. The buffers are never freed while the `ConstBuffer` references them because of leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer` objects so that the `OpenArray` instance can take ownership of the metadata buffers and free them appropriately. Note that before the chunked buffer patch, we had a similar pattern where the `Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be affecting older core builds.
joe-maley
pushed a commit
that referenced
this pull request
Jun 16, 2020
This fixes the following within StorageManager::load_array_metadata(): 1. A direct memory leak of a `Buffer` object introduced with the chunked buffers patch. 2. An long-standing indirect memory leak of the metadata buffers. For #1, an auxiliary `Buffer` object is created to store the contents of a read from a chunked buffer. The `Buffer` object is never deleted. For #2, the raw buffers inside of the `Buffer` object are passed into a `ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them. The buffers are never freed while the `ConstBuffer` references them because of leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer` objects so that the `OpenArray` instance can take ownership of the metadata buffers and free them appropriately. Note that before the chunked buffer patch, we had a similar pattern where the `Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be affecting older core builds.
joe-maley
pushed a commit
that referenced
this pull request
Jun 16, 2020
This fixes the following within StorageManager::load_array_metadata(): 1. A direct memory leak of a `Buffer` object introduced with the chunked buffers patch. 2. An long-standing indirect memory leak of the metadata buffers. For #1, an auxiliary `Buffer` object is created to store the contents of a read from a chunked buffer. The `Buffer` object is never deleted. For #2, the raw buffers inside of the `Buffer` object are passed into a `ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them. The buffers are never freed while the `ConstBuffer` references them because of leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer` objects so that the `OpenArray` instance can take ownership of the metadata buffers and free them appropriately. Note that before the chunked buffer patch, we had a similar pattern where the `Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be affecting older core builds.
joe-maley
pushed a commit
that referenced
this pull request
Jun 17, 2020
This fixes the following within StorageManager::load_array_metadata(): 1. A direct memory leak of a `Buffer` object introduced with the chunked buffers patch. 2. An long-standing indirect memory leak of the metadata buffers. For #1, an auxiliary `Buffer` object is created to store the contents of a read from a chunked buffer. The `Buffer` object is never deleted. For #2, the raw buffers inside of the `Buffer` object are passed into a `ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them. The buffers are never freed while the `ConstBuffer` references them because of leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer` objects so that the `OpenArray` instance can take ownership of the metadata buffers and free them appropriately. Note that before the chunked buffer patch, we had a similar pattern where the `Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be affecting older core builds.
joe-maley
pushed a commit
that referenced
this pull request
Jun 17, 2020
This fixes the following within StorageManager::load_array_metadata(): 1. A direct memory leak of a `Buffer` object introduced with the chunked buffers patch. 2. An long-standing indirect memory leak of the metadata buffers. For #1, an auxiliary `Buffer` object is created to store the contents of a read from a chunked buffer. The `Buffer` object is never deleted. For #2, the raw buffers inside of the `Buffer` object are passed into a `ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them. The buffers are never freed while the `ConstBuffer` references them because of leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer` objects so that the `OpenArray` instance can take ownership of the metadata buffers and free them appropriately. Note that before the chunked buffer patch, we had a similar pattern where the `Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be affecting older core builds.
joe-maley
pushed a commit
that referenced
this pull request
Jul 22, 2020
This fixes the following within StorageManager::load_array_metadata(): 1. A direct memory leak of a `Buffer` object introduced with the chunked buffers patch. 2. An long-standing indirect memory leak of the metadata buffers. For #1, an auxiliary `Buffer` object is created to store the contents of a read from a chunked buffer. The `Buffer` object is never deleted. For #2, the raw buffers inside of the `Buffer` object are passed into a `ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them. The buffers are never freed while the `ConstBuffer` references them because of leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer` objects so that the `OpenArray` instance can take ownership of the metadata buffers and free them appropriately. Note that before the chunked buffer patch, we had a similar pattern where the `Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be affecting older core builds.
Shelnutt2
pushed a commit
that referenced
this pull request
Jul 22, 2020
This fixes the following within StorageManager::load_array_metadata(): 1. A direct memory leak of a `Buffer` object introduced with the chunked buffers patch. 2. An long-standing indirect memory leak of the metadata buffers. For #1, an auxiliary `Buffer` object is created to store the contents of a read from a chunked buffer. The `Buffer` object is never deleted. For #2, the raw buffers inside of the `Buffer` object are passed into a `ConstBuffer`. However, the `ConstBuffer` does not take ownership or free them. The buffers are never freed while the `ConstBuffer` references them because of leak #1. For this patch, I've replaced the `ConstBuffer` objects with `Buffer` objects so that the `OpenArray` instance can take ownership of the metadata buffers and free them appropriately. Note that before the chunked buffer patch, we had a similar pattern where the `Tile` buffer was disowned, but never freed by the `ConstBuffer`. This may be affecting older core builds.
joe-maley
pushed a commit
that referenced
this pull request
Aug 7, 2020
This introduces a global thread pool for use when TBB is disabled. The performance has not been exhaustively benchmarked against TBB. However, I did test this on two readily available scenarios that I had been recently performance benchmarking for other reasons. One scenario makes heavy use of the parallel_sort path while the other does not. Surprisingly, disabling TBB performs about 10% quicker with this pach. // Scenario #1 TBB: 3.4s TBB disabled, this patch: 3.0s TBB disabled, on dev: 10.0s // Scenario #2 TBB: 3.1 TBB disabled, this patch: 2.7s TBB disabled, on dev: 9.1s For now, this patch uses the threadpool at the same scope as the TBB scheduler. It is a global thread pool, shared among a single process, and conditionally compiled. The concurrency level that the thread pool is configured with is determined from the "sm.num_tbb_threads" config. This patch does not disable TBB by default.
joe-maley
pushed a commit
that referenced
this pull request
Aug 7, 2020
This introduces a global thread pool for use when TBB is disabled. The performance has not been exhaustively benchmarked against TBB. However, I did test this on two readily available scenarios that I had been recently performance benchmarking for other reasons. One scenario makes heavy use of the parallel_sort path while the other does not. Surprisingly, disabling TBB performs about 10% quicker with this pach. // Scenario #1 TBB: 3.4s TBB disabled, this patch: 3.0s TBB disabled, on dev: 10.0s // Scenario #2 TBB: 3.1 TBB disabled, this patch: 2.7s TBB disabled, on dev: 9.1s For now, this patch uses the threadpool at the same scope as the TBB scheduler. It is a global thread pool, shared among a single process, and conditionally compiled. The concurrency level that the thread pool is configured with is determined from the "sm.num_tbb_threads" config. This patch does not disable TBB by default.
joe-maley
pushed a commit
that referenced
this pull request
Aug 7, 2020
This introduces a global thread pool for use when TBB is disabled. The performance has not been exhaustively benchmarked against TBB. However, I did test this on two readily available scenarios that I had been recently performance benchmarking for other reasons. One scenario makes heavy use of the parallel_sort path while the other does not. Surprisingly, disabling TBB performs about 10% quicker with this pach. // Scenario #1 TBB: 3.4s TBB disabled, this patch: 3.0s TBB disabled, on dev: 10.0s // Scenario #2 TBB: 3.1 TBB disabled, this patch: 2.7s TBB disabled, on dev: 9.1s For now, this patch uses the threadpool at the same scope as the TBB scheduler. It is a global thread pool, shared among a single process, and conditionally compiled. The concurrency level that the thread pool is configured with is determined from the "sm.num_tbb_threads" config. This patch does not disable TBB by default.
joe-maley
pushed a commit
that referenced
this pull request
Aug 7, 2020
This introduces a global thread pool for use when TBB is disabled. The performance has not been exhaustively benchmarked against TBB. However, I did test this on two readily available scenarios that I had been recently performance benchmarking for other reasons. One scenario makes heavy use of the parallel_sort path while the other does not. Surprisingly, disabling TBB performs about 10% quicker with this pach. // Scenario #1 TBB: 3.4s TBB disabled, this patch: 3.0s TBB disabled, on dev: 10.0s // Scenario #2 TBB: 3.1 TBB disabled, this patch: 2.7s TBB disabled, on dev: 9.1s For now, this patch uses the threadpool at the same scope as the TBB scheduler. It is a global thread pool, shared among a single process, and conditionally compiled. The concurrency level that the thread pool is configured with is determined from the "sm.num_tbb_threads" config. This patch does not disable TBB by default. Co-authored-by: Joe Maley <[email protected]>
joe-maley
pushed a commit
that referenced
this pull request
Sep 17, 2020
Currently, reading/unfiltering/copying tiles are performed serially among attribute names within Reader::copy_attribute_values(): ``` for (const auto& name : names) { read_tiles(name, result_tiles); unfilter_tiles(name, result_tiles, &cs_ranges); if (!array_schema_->var_size(name)) copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache); else copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache); clear_tiles(name, result_tiles); } ``` Parallelizing the entire for-loop has a few problems: 1. Each iteration requires a large amount of temporary memory to buffer reads into before they are copied into `buffers_` and cleared within clear_tiles(). This memory can not be shared among parallel iterations, so memory bloat scales linearly with the number of parallel iterations. 2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and `clear_tiles` are thread-unsafe because of unprotected access to the individual `ResultTile` elements within `result_tiles`. 3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe. Problem #1: This patch restricts the number of parallel iterations to 2. This is a hard-coded value that I will justify later in this commit message. Problem #2: We can achieve thread-safety with a mutex for each ResultTile element. Even if we tolerate the memory overhead associated with allocating a mutex for each ResultTile, the CPU time spent locking at this granularity is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths. However, it is experimentally negligible for the other paths. This is enough of a penalty that I have decided to use a single lock to protect all elements. This effectively serializes all routines, except for the IO tasks within `read_tiles`. Problem #3: I have made the context caches thread safe, although they are still accessed in serial. This is mostly because I had to write this code before I was able to test that parallelizing `copy_*_cells` was too expensive with per-ResulTile mutexes. I have decided to leave it in this patch because it is zero-harm and required for future improvements. I have been testing this in an IO-bound workload (using an S3 backend). Due to the large variance in S3 request times, I ran my test many times and collected the best-case numbers. // Execution times: Without this patch: 4.5s With this patch, 1 parallel iteration: 4.5s With this patch, 2 parallel iteration: 3.4s With this patch, 3 parallel iteration: 3.4s With this patch, 4 parallel iteration: 3.2s With this patch, 6 parallel iteration: 3.2s With this patch, 8 parallel iteration: 3.3s With this patch, 12 parallel iteration: 3.7s Given the above, it does seem that the number of parallel iterations should be configurable. For more than 2 parallel iterations, we linearly scale memory bloat with little-to-gain in execution time. However, with just 2 parallel iterations, we receive a 32% speed-up. Future Improvments: - Modify ThreadPool::execute() to allow limiting the number of parallel iterations for an arbitrary number of tasks. This patch uses a step-wise execution of parallel reads. In other words: both parallel iterations must complete before the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should start immediately after one of the first two complete. - Increase the granularity of protection on ResultTiles. I have a couple ideas but they would be large and out of scope for this patch. - Revisit the read path for dimensions. Right now, there is no limit to the number of dimensions read at one time. If we have a large number of dimensions, we will have a memory bottleneck.
joe-maley
pushed a commit
that referenced
this pull request
Sep 17, 2020
Currently, reading/unfiltering/copying tiles are performed serially among attribute names within Reader::copy_attribute_values(): ``` for (const auto& name : names) { read_tiles(name, result_tiles); unfilter_tiles(name, result_tiles, &cs_ranges); if (!array_schema_->var_size(name)) copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache); else copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache); clear_tiles(name, result_tiles); } ``` Parallelizing the entire for-loop has a few problems: 1. Each iteration requires a large amount of temporary memory to buffer reads into before they are copied into `buffers_` and cleared within clear_tiles(). This memory can not be shared among parallel iterations, so memory bloat scales linearly with the number of parallel iterations. 2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and `clear_tiles` are thread-unsafe because of unprotected access to the individual `ResultTile` elements within `result_tiles`. 3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe. Problem #1: This patch restricts the number of parallel iterations to 2. This is a hard-coded value that I will justify later in this commit message. Problem #2: We can achieve thread-safety with a mutex for each ResultTile element. Even if we tolerate the memory overhead associated with allocating a mutex for each ResultTile, the CPU time spent locking at this granularity is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths. However, it is experimentally negligible for the other paths. This is enough of a penalty that I have decided to use a single lock to protect all elements. This effectively serializes all routines, except for the IO tasks within `read_tiles`. Problem #3: I have made the context caches thread safe, although they are still accessed in serial. This is mostly because I had to write this code before I was able to test that parallelizing `copy_*_cells` was too expensive with per-ResulTile mutexes. I have decided to leave it in this patch because it is zero-harm and required for future improvements. I have been testing this in an IO-bound workload (using an S3 backend). Due to the large variance in S3 request times, I ran my test many times and collected the best-case numbers. // Execution times: Without this patch: 4.5s With this patch, 1 parallel iteration: 4.5s With this patch, 2 parallel iteration: 3.4s With this patch, 3 parallel iteration: 3.4s With this patch, 4 parallel iteration: 3.2s With this patch, 6 parallel iteration: 3.2s With this patch, 8 parallel iteration: 3.3s With this patch, 12 parallel iteration: 3.7s Given the above, it does seem that the number of parallel iterations should be configurable. For more than 2 parallel iterations, we linearly scale memory bloat with little-to-gain in execution time. However, with just 2 parallel iterations, we receive a 32% speed-up. Future Improvments: - Modify ThreadPool::execute() to allow limiting the number of parallel iterations for an arbitrary number of tasks. This patch uses a step-wise execution of parallel reads. In other words: both parallel iterations must complete before the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should start immediately after one of the first two complete. - Increase the granularity of protection on ResultTiles. I have a couple ideas but they would be large and out of scope for this patch. - Revisit the read path for dimensions. Right now, there is no limit to the number of dimensions read at one time. If we have a large number of dimensions, we will have a memory bottleneck.
joe-maley
pushed a commit
that referenced
this pull request
Sep 17, 2020
Currently, reading/unfiltering/copying tiles are performed serially among attribute names within Reader::copy_attribute_values(): ``` for (const auto& name : names) { read_tiles(name, result_tiles); unfilter_tiles(name, result_tiles, &cs_ranges); if (!array_schema_->var_size(name)) copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache); else copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache); clear_tiles(name, result_tiles); } ``` Parallelizing the entire for-loop has a few problems: 1. Each iteration requires a large amount of temporary memory to buffer reads into before they are copied into `buffers_` and cleared within clear_tiles(). This memory can not be shared among parallel iterations, so memory bloat scales linearly with the number of parallel iterations. 2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and `clear_tiles` are thread-unsafe because of unprotected access to the individual `ResultTile` elements within `result_tiles`. 3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe. Problem #1: This patch restricts the number of parallel iterations to 2. This is a hard-coded value that I will justify later in this commit message. Problem #2: We can achieve thread-safety with a mutex for each ResultTile element. Even if we tolerate the memory overhead associated with allocating a mutex for each ResultTile, the CPU time spent locking at this granularity is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths. However, it is experimentally negligible for the other paths. This is enough of a penalty that I have decided to use a single lock to protect all elements. This effectively serializes all routines, except for the IO tasks within `read_tiles`. Problem #3: I have made the context caches thread safe, although they are still accessed in serial. This is mostly because I had to write this code before I was able to test that parallelizing `copy_*_cells` was too expensive with per-ResulTile mutexes. I have decided to leave it in this patch because it is zero-harm and required for future improvements. I have been testing this in an IO-bound workload (using an S3 backend). Due to the large variance in S3 request times, I ran my test many times and collected the best-case numbers. // Execution times: Without this patch: 4.5s With this patch, 1 parallel iteration: 4.5s With this patch, 2 parallel iteration: 3.4s With this patch, 3 parallel iteration: 3.4s With this patch, 4 parallel iteration: 3.2s With this patch, 6 parallel iteration: 3.2s With this patch, 8 parallel iteration: 3.3s With this patch, 12 parallel iteration: 3.7s Given the above, it does seem that the number of parallel iterations should be configurable. For more than 2 parallel iterations, we linearly scale memory bloat with little-to-gain in execution time. However, with just 2 parallel iterations, we receive a 32% speed-up. Future Improvments: - Modify ThreadPool::execute() to allow limiting the number of parallel iterations for an arbitrary number of tasks. This patch uses a step-wise execution of parallel reads. In other words: both parallel iterations must complete before the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should start immediately after one of the first two complete. - Increase the granularity of protection on ResultTiles. I have a couple ideas but they would be large and out of scope for this patch. - Revisit the read path for dimensions. Right now, there is no limit to the number of dimensions read at one time. If we have a large number of dimensions, we will have a memory bottleneck.
joe-maley
pushed a commit
that referenced
this pull request
Sep 18, 2020
Currently, reading/unfiltering/copying tiles are performed serially among attribute names within Reader::copy_attribute_values(): ``` for (const auto& name : names) { read_tiles(name, result_tiles); unfilter_tiles(name, result_tiles, &cs_ranges); if (!array_schema_->var_size(name)) copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache); else copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache); clear_tiles(name, result_tiles); } ``` Parallelizing the entire for-loop has a few problems: 1. Each iteration requires a large amount of temporary memory to buffer reads into before they are copied into `buffers_` and cleared within clear_tiles(). This memory can not be shared among parallel iterations, so memory bloat scales linearly with the number of parallel iterations. 2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and `clear_tiles` are thread-unsafe because of unprotected access to the individual `ResultTile` elements within `result_tiles`. 3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe. Problem #1: This patch restricts the number of parallel iterations to 2. This is a hard-coded value that I will justify later in this commit message. Problem #2: We can achieve thread-safety with a mutex for each ResultTile element. Even if we tolerate the memory overhead associated with allocating a mutex for each ResultTile, the CPU time spent locking at this granularity is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths. However, it is experimentally negligible for the other paths. This is enough of a penalty that I have decided to use a single lock to protect all elements. This effectively serializes all routines, except for the IO tasks within `read_tiles`. Problem #3: I have made the context caches thread safe, although they are still accessed in serial. This is mostly because I had to write this code before I was able to test that parallelizing `copy_*_cells` was too expensive with per-ResulTile mutexes. I have decided to leave it in this patch because it is zero-harm and required for future improvements. I have been testing this in an IO-bound workload (using an S3 backend). Due to the large variance in S3 request times, I ran my test many times and collected the best-case numbers. // Execution times: Without this patch: 4.5s With this patch, 1 parallel iteration: 4.5s With this patch, 2 parallel iteration: 3.4s With this patch, 3 parallel iteration: 3.4s With this patch, 4 parallel iteration: 3.2s With this patch, 6 parallel iteration: 3.2s With this patch, 8 parallel iteration: 3.3s With this patch, 12 parallel iteration: 3.7s Given the above, it does seem that the number of parallel iterations should be configurable. For more than 2 parallel iterations, we linearly scale memory bloat with little-to-gain in execution time. However, with just 2 parallel iterations, we receive a 32% speed-up. Future Improvments: - Modify ThreadPool::execute() to allow limiting the number of parallel iterations for an arbitrary number of tasks. This patch uses a step-wise execution of parallel reads. In other words: both parallel iterations must complete before the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should start immediately after one of the first two complete. - Increase the granularity of protection on ResultTiles. I have a couple ideas but they would be large and out of scope for this patch. - Revisit the read path for dimensions. Right now, there is no limit to the number of dimensions read at one time. If we have a large number of dimensions, we will have a memory bottleneck.
joe-maley
pushed a commit
that referenced
this pull request
Sep 18, 2020
Currently, reading/unfiltering/copying tiles are performed serially among attribute names within Reader::copy_attribute_values(): ``` for (const auto& name : names) { read_tiles(name, result_tiles); unfilter_tiles(name, result_tiles, &cs_ranges); if (!array_schema_->var_size(name)) copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache); else copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache); clear_tiles(name, result_tiles); } ``` Parallelizing the entire for-loop has a few problems: 1. Each iteration requires a large amount of temporary memory to buffer reads into before they are copied into `buffers_` and cleared within clear_tiles(). This memory can not be shared among parallel iterations, so memory bloat scales linearly with the number of parallel iterations. 2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and `clear_tiles` are thread-unsafe because of unprotected access to the individual `ResultTile` elements within `result_tiles`. 3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe. Problem #1: This patch restricts the number of parallel iterations to 2. This is a hard-coded value that I will justify later in this commit message. Problem #2: We can achieve thread-safety with a mutex for each ResultTile element. Even if we tolerate the memory overhead associated with allocating a mutex for each ResultTile, the CPU time spent locking at this granularity is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths. However, it is experimentally negligible for the other paths. This is enough of a penalty that I have decided to use a single lock to protect all elements. This effectively serializes all routines, except for the IO tasks within `read_tiles`. Problem #3: I have made the context caches thread safe, although they are still accessed in serial. This is mostly because I had to write this code before I was able to test that parallelizing `copy_*_cells` was too expensive with per-ResulTile mutexes. I have decided to leave it in this patch because it is zero-harm and required for future improvements. I have been testing this in an IO-bound workload (using an S3 backend). Due to the large variance in S3 request times, I ran my test many times and collected the best-case numbers. // Execution times: Without this patch: 4.5s With this patch, 1 parallel iteration: 4.5s With this patch, 2 parallel iteration: 3.4s With this patch, 3 parallel iteration: 3.4s With this patch, 4 parallel iteration: 3.2s With this patch, 6 parallel iteration: 3.2s With this patch, 8 parallel iteration: 3.3s With this patch, 12 parallel iteration: 3.7s Given the above, it does seem that the number of parallel iterations should be configurable. For more than 2 parallel iterations, we linearly scale memory bloat with little-to-gain in execution time. However, with just 2 parallel iterations, we receive a 32% speed-up. Future Improvments: - Modify ThreadPool::execute() to allow limiting the number of parallel iterations for an arbitrary number of tasks. This patch uses a step-wise execution of parallel reads. In other words: both parallel iterations must complete before the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should start immediately after one of the first two complete. - Increase the granularity of protection on ResultTiles. I have a couple ideas but they would be large and out of scope for this patch. - Revisit the read path for dimensions. Right now, there is no limit to the number of dimensions read at one time. If we have a large number of dimensions, we will have a memory bottleneck.
joe-maley
pushed a commit
that referenced
this pull request
Sep 21, 2020
Currently, reading/unfiltering/copying tiles are performed serially among attribute names within Reader::copy_attribute_values(): ``` for (const auto& name : names) { read_tiles(name, result_tiles); unfilter_tiles(name, result_tiles, &cs_ranges); if (!array_schema_->var_size(name)) copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache); else copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache); clear_tiles(name, result_tiles); } ``` Parallelizing the entire for-loop has a few problems: 1. Each iteration requires a large amount of temporary memory to buffer reads into before they are copied into `buffers_` and cleared within clear_tiles(). This memory can not be shared among parallel iterations, so memory bloat scales linearly with the number of parallel iterations. 2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and `clear_tiles` are thread-unsafe because of unprotected access to the individual `ResultTile` elements within `result_tiles`. 3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe. Problem #1: This patch restricts the number of parallel iterations to 2. This is a hard-coded value that I will justify later in this commit message. Problem #2: We can achieve thread-safety with a mutex for each ResultTile element. Even if we tolerate the memory overhead associated with allocating a mutex for each ResultTile, the CPU time spent locking at this granularity is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths. However, it is experimentally negligible for the other paths. This is enough of a penalty that I have decided to use a single lock to protect all elements. This effectively serializes all routines, except for the IO tasks within `read_tiles`. Problem #3: I have made the context caches thread safe, although they are still accessed in serial. This is mostly because I had to write this code before I was able to test that parallelizing `copy_*_cells` was too expensive with per-ResulTile mutexes. I have decided to leave it in this patch because it is zero-harm and required for future improvements. I have been testing this in an IO-bound workload (using an S3 backend). Due to the large variance in S3 request times, I ran my test many times and collected the best-case numbers. // Execution times: Without this patch: 4.5s With this patch, 1 parallel iteration: 4.5s With this patch, 2 parallel iteration: 3.4s With this patch, 3 parallel iteration: 3.4s With this patch, 4 parallel iteration: 3.2s With this patch, 6 parallel iteration: 3.2s With this patch, 8 parallel iteration: 3.3s With this patch, 12 parallel iteration: 3.7s Given the above, it does seem that the number of parallel iterations should be configurable. For more than 2 parallel iterations, we linearly scale memory bloat with little-to-gain in execution time. However, with just 2 parallel iterations, we receive a 32% speed-up. Future Improvments: - Modify ThreadPool::execute() to allow limiting the number of parallel iterations for an arbitrary number of tasks. This patch uses a step-wise execution of parallel reads. In other words: both parallel iterations must complete before the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should start immediately after one of the first two complete. - Increase the granularity of protection on ResultTiles. I have a couple ideas but they would be large and out of scope for this patch. - Revisit the read path for dimensions. Right now, there is no limit to the number of dimensions read at one time. If we have a large number of dimensions, we will have a memory bottleneck.
joe-maley
pushed a commit
that referenced
this pull request
Sep 21, 2020
Currently, reading/unfiltering/copying tiles are performed serially among attribute names within Reader::copy_attribute_values(): ``` for (const auto& name : names) { read_tiles(name, result_tiles); unfilter_tiles(name, result_tiles, &cs_ranges); if (!array_schema_->var_size(name)) copy_fixed_cells(name, stride, result_cell_slabs, &fixed_ctx_cache); else copy_var_cells(name, stride, result_cell_slabs, &var_ctx_cache); clear_tiles(name, result_tiles); } ``` Parallelizing the entire for-loop has a few problems: 1. Each iteration requires a large amount of temporary memory to buffer reads into before they are copied into `buffers_` and cleared within clear_tiles(). This memory can not be shared among parallel iterations, so memory bloat scales linearly with the number of parallel iterations. 2. `read_tiles`, `unfilter_tiles`, `copy_fixed_cells`, `copy_var_cells`, and `clear_tiles` are thread-unsafe because of unprotected access to the individual `ResultTile` elements within `result_tiles`. 3. `copy_fixed_cells` and `copy_var_cells` are thread-unsafe because the `fixed_ctx_cache` and `var_ctx_cache` are thread-unsafe. Problem #1: This patch restricts the number of parallel iterations to 2. This is a hard-coded value that I will justify later in this commit message. Problem #2: We can achieve thread-safety with a mutex for each ResultTile element. Even if we tolerate the memory overhead associated with allocating a mutex for each ResultTile, the CPU time spent locking at this granularity is unacceptably high within the `copy_fixed_cells` and `copy_var_cells` paths. However, it is experimentally negligible for the other paths. This is enough of a penalty that I have decided to use a single lock to protect all elements. This effectively serializes all routines, except for the IO tasks within `read_tiles`. Problem #3: I have made the context caches thread safe, although they are still accessed in serial. This is mostly because I had to write this code before I was able to test that parallelizing `copy_*_cells` was too expensive with per-ResulTile mutexes. I have decided to leave it in this patch because it is zero-harm and required for future improvements. I have been testing this in an IO-bound workload (using an S3 backend). Due to the large variance in S3 request times, I ran my test many times and collected the best-case numbers. // Execution times: Without this patch: 4.5s With this patch, 1 parallel iteration: 4.5s With this patch, 2 parallel iteration: 3.4s With this patch, 3 parallel iteration: 3.4s With this patch, 4 parallel iteration: 3.2s With this patch, 6 parallel iteration: 3.2s With this patch, 8 parallel iteration: 3.3s With this patch, 12 parallel iteration: 3.7s Given the above, it does seem that the number of parallel iterations should be configurable. For more than 2 parallel iterations, we linearly scale memory bloat with little-to-gain in execution time. However, with just 2 parallel iterations, we receive a 32% speed-up. Future Improvments: - Modify ThreadPool::execute() to allow limiting the number of parallel iterations for an arbitrary number of tasks. This patch uses a step-wise execution of parallel reads. In other words: both parallel iterations must complete before the next pair of `read_tiles()` can execute. Ideally, the 3rd iteration should start immediately after one of the first two complete. - Increase the granularity of protection on ResultTiles. I have a couple ideas but they would be large and out of scope for this patch. - Revisit the read path for dimensions. Right now, there is no limit to the number of dimensions read at one time. If we have a large number of dimensions, we will have a memory bottleneck. Co-authored-by: Joe Maley <[email protected]>
joe-maley
pushed a commit
that referenced
this pull request
Nov 13, 2020
The offsets in each buffer correspond to the values in its data buffer. To build a single contigious buffer, we must ensure the offset values continue in ascending order from the previous buffer. For example, consider the following example storing int32 values. Buffer #1: offsets: [0, 8, 16] values: [1, 2, 3, 4, 5, 6, 7] Buffer #2: offsets: [0, 12] values: [100, 200, 300, 400, 500] The final, contigious buffer will be: offsets: [0, 8, 16, 28, 40] values: [1, 2, 3, 4, 5, 6, 7, 100, 200, 300, 400, 500] The last two offsets, `28, 40` were calculated by adding `28` to offsets of Buffer #2: [0, 12]. The `28` was calculated as the byte size of the values in Buffer #1.
joe-maley
pushed a commit
that referenced
this pull request
Nov 16, 2020
The offsets in each buffer correspond to the values in its data buffer. To build a single contigious buffer, we must ensure the offset values continue in ascending order from the previous buffer. For example, consider the following example storing int32 values. Buffer #1: offsets: [0, 8, 16] values: [1, 2, 3, 4, 5, 6, 7] Buffer #2: offsets: [0, 12] values: [100, 200, 300, 400, 500] The final, contigious buffer will be: offsets: [0, 8, 16, 28, 40] values: [1, 2, 3, 4, 5, 6, 7, 100, 200, 300, 400, 500] The last two offsets, `28, 40` were calculated by adding `28` to offsets of Buffer #2: [0, 12]. The `28` was calculated as the byte size of the values in Buffer #1. Co-authored-by: Joe Maley <[email protected]>
github-actions bot
pushed a commit
that referenced
this pull request
Nov 16, 2020
The offsets in each buffer correspond to the values in its data buffer. To build a single contigious buffer, we must ensure the offset values continue in ascending order from the previous buffer. For example, consider the following example storing int32 values. Buffer #1: offsets: [0, 8, 16] values: [1, 2, 3, 4, 5, 6, 7] Buffer #2: offsets: [0, 12] values: [100, 200, 300, 400, 500] The final, contigious buffer will be: offsets: [0, 8, 16, 28, 40] values: [1, 2, 3, 4, 5, 6, 7, 100, 200, 300, 400, 500] The last two offsets, `28, 40` were calculated by adding `28` to offsets of Buffer #2: [0, 12]. The `28` was calculated as the byte size of the values in Buffer #1. Co-authored-by: Joe Maley <[email protected]>
joe-maley
pushed a commit
that referenced
this pull request
Nov 17, 2020
The offsets in each buffer correspond to the values in its data buffer. To build a single contigious buffer, we must ensure the offset values continue in ascending order from the previous buffer. For example, consider the following example storing int32 values. Buffer #1: offsets: [0, 8, 16] values: [1, 2, 3, 4, 5, 6, 7] Buffer #2: offsets: [0, 12] values: [100, 200, 300, 400, 500] The final, contigious buffer will be: offsets: [0, 8, 16, 28, 40] values: [1, 2, 3, 4, 5, 6, 7, 100, 200, 300, 400, 500] The last two offsets, `28, 40` were calculated by adding `28` to offsets of Buffer #2: [0, 12]. The `28` was calculated as the byte size of the values in Buffer #1. Co-authored-by: Joe Maley <[email protected]> Co-authored-by: joe maley <[email protected]> Co-authored-by: Joe Maley <[email protected]>
joe-maley
pushed a commit
that referenced
this pull request
May 3, 2021
Fixes #2201 This patch introduces a global lock on the `S3Client` constructor to prevent the following race reported from TSAN: ``` WARNING: ThreadSanitizer: data race (pid=12) Write of size 8 at 0x7fe93658a1b0 by thread T16 (mutexes: write M144250874682678808, write M150443169551677688): #0 cJSON_ParseWithOpts external/aws/aws-cpp-sdk-core/source/external/cjson/cJSON.cpp:1006 (libexternal_Saws_Slibaws.so+0x299bef) #1 Aws::Utils::Json::JsonValue::JsonValue(std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&) external/aws/aws-cpp-sdk-core/source/utils/json/JsonSerializer.cpp:39 (libexternal_Saws_Slibaws.so+0x32d09c) #2 Aws::Config::EC2InstanceProfileConfigLoader::LoadInternal() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:341 (libexternal_Saws_Slibaws.so+0x286d88) #3 Aws::Config::AWSProfileConfigLoader::Load() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:47 (libexternal_Saws_Slibaws.so+0x282b4f) #4 Aws::Auth::InstanceProfileCredentialsProvider::Reload() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:265 (libexternal_Saws_Slibaws.so+0x235987) #5 Aws::Auth::InstanceProfileCredentialsProvider::RefreshIfExpired() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:283 (libexternal_Saws_Slibaws.so+0x23613b) #6 Aws::Auth::InstanceProfileCredentialsProvider::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:250 (libexternal_Saws_Slibaws.so+0x23be9a) #7 Aws::Auth::AWSCredentialsProviderChain::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProviderChain.cpp:35 (libexternal_Saws_Slibaws.so+0x244550) #8 Aws::Client::AWSAuthV4Signer::AWSAuthV4Signer(std::shared_ptr<Aws::Auth::AWSCredentialsProvider> const&, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool) external/aws/aws-cpp-sdk-core/source/auth/AWSAuthSigner.cpp:170 (libexternal_Saws_Slibaws.so+0x2262ed) #9 std::shared_ptr<Aws::Client::AWSAuthV4Signer> Aws::MakeShared<Aws::Client::AWSAuthV4Signer, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool>(char const*, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>&&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool&&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Saws_Slibaws.so+0x39a78b) #10 Aws::S3::S3Client::S3Client(Aws::Client::ClientConfiguration const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool, Aws::S3::US_EAST_1_REGIONAL_ENDPOINT_OPTION) external/aws/aws-cpp-sdk-s3/source/S3Client.cpp:134 (libexternal_Saws_Slibaws.so+0x39a78b) #11 std::shared_ptr<Aws::S3::S3Client>::shared_ptr<Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(std::_Sp_alloc_shared_tag<Aws::Allocator<Aws::S3::S3Client> >, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) #12 std::shared_ptr<Aws::S3::S3Client> std::allocate_shared<Aws::S3::S3Client, Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(Aws::Allocator<Aws::S3::S3Client> const&, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/bits/shared_ptr.h:702 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) #13 std::shared_ptr<Aws::S3::S3Client> Aws::MakeShared<Aws::S3::S3Client, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(char const*, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) external/aws/aws-cpp-sdk-core/include/aws/core/utils/memory/stl/AWSAllocator.h:106 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) #14 tiledb::sm::S3::init_client() const external/com_github_tiledb_tiledb/tiledb/sm/filesystem/s3.cc:1104 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) ``` --- TYPE: BUG DESC: Fix race within `S3::init_client`
joe-maley
pushed a commit
that referenced
this pull request
May 3, 2021
Fixes #2201 This patch introduces a global lock on the `S3Client` constructor to prevent the following race reported from TSAN: ``` WARNING: ThreadSanitizer: data race (pid=12) Write of size 8 at 0x7fe93658a1b0 by thread T16 (mutexes: write M144250874682678808, write M150443169551677688): #0 cJSON_ParseWithOpts external/aws/aws-cpp-sdk-core/source/external/cjson/cJSON.cpp:1006 (libexternal_Saws_Slibaws.so+0x299bef) #1 Aws::Utils::Json::JsonValue::JsonValue(std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&) external/aws/aws-cpp-sdk-core/source/utils/json/JsonSerializer.cpp:39 (libexternal_Saws_Slibaws.so+0x32d09c) #2 Aws::Config::EC2InstanceProfileConfigLoader::LoadInternal() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:341 (libexternal_Saws_Slibaws.so+0x286d88) #3 Aws::Config::AWSProfileConfigLoader::Load() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:47 (libexternal_Saws_Slibaws.so+0x282b4f) #4 Aws::Auth::InstanceProfileCredentialsProvider::Reload() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:265 (libexternal_Saws_Slibaws.so+0x235987) #5 Aws::Auth::InstanceProfileCredentialsProvider::RefreshIfExpired() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:283 (libexternal_Saws_Slibaws.so+0x23613b) #6 Aws::Auth::InstanceProfileCredentialsProvider::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:250 (libexternal_Saws_Slibaws.so+0x23be9a) #7 Aws::Auth::AWSCredentialsProviderChain::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProviderChain.cpp:35 (libexternal_Saws_Slibaws.so+0x244550) #8 Aws::Client::AWSAuthV4Signer::AWSAuthV4Signer(std::shared_ptr<Aws::Auth::AWSCredentialsProvider> const&, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool) external/aws/aws-cpp-sdk-core/source/auth/AWSAuthSigner.cpp:170 (libexternal_Saws_Slibaws.so+0x2262ed) #9 std::shared_ptr<Aws::Client::AWSAuthV4Signer> Aws::MakeShared<Aws::Client::AWSAuthV4Signer, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool>(char const*, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>&&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool&&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Saws_Slibaws.so+0x39a78b) #10 Aws::S3::S3Client::S3Client(Aws::Client::ClientConfiguration const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool, Aws::S3::US_EAST_1_REGIONAL_ENDPOINT_OPTION) external/aws/aws-cpp-sdk-s3/source/S3Client.cpp:134 (libexternal_Saws_Slibaws.so+0x39a78b) #11 std::shared_ptr<Aws::S3::S3Client>::shared_ptr<Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(std::_Sp_alloc_shared_tag<Aws::Allocator<Aws::S3::S3Client> >, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) #12 std::shared_ptr<Aws::S3::S3Client> std::allocate_shared<Aws::S3::S3Client, Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(Aws::Allocator<Aws::S3::S3Client> const&, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/bits/shared_ptr.h:702 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) #13 std::shared_ptr<Aws::S3::S3Client> Aws::MakeShared<Aws::S3::S3Client, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(char const*, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) external/aws/aws-cpp-sdk-core/include/aws/core/utils/memory/stl/AWSAllocator.h:106 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) #14 tiledb::sm::S3::init_client() const external/com_github_tiledb_tiledb/tiledb/sm/filesystem/s3.cc:1104 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd) ``` --- TYPE: BUG DESC: Fix race within `S3::init_client` Co-authored-by: Joe Maley <[email protected]>
davisp
added a commit
that referenced
this pull request
Mar 8, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Somewhere in the code the tile_id variable was defined as int, whereas it should have been int64_t. This created problems for very small tile extents in a large domain, which led to tile id overflows due to the numerous space tiles.