-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add an alternative strategy for copying elements from a transposed tensor into a contiguous buffer, using blocking, and enable it to be used via `Tensor::copy_from`. The existing naive copy implementation performs well except when the strides of the source view lead to a significant rate of cache conflicts. This typically happens when the last stride is a multiple of the cache line size, and especially when it is a power of 2. To improve this, detect this case and switch to an alternative copying procedure which uses blocking and tiling. Using the `bench_transpose` benchmark in `src/ops/layout.rs, this avoids the significant increase in overhead, vs a simple memory copy, when the source stride is a power of 2.
- Loading branch information
1 parent
1aed5a4
commit ea55853
Showing
2 changed files
with
199 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters