This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.6.0 (previously 1.5.3)
Summary
CUB 1.6.0 changes the scan and reduce interfaces. Exclusive scans now accept an "initial value" instead of an "identity value". Scans and reductions now support differing input and output sequence types. Additionally, many bugs have been fixed.
Breaking Changes
- Device/block/warp-wide exclusive scans have been revised to now accept an "initial value" (instead of an "identity value") for seeding the computation with an arbitrary prefix.
- Device-wide reductions and scans can now have input sequence types that are different from output sequence types (as long as they are convertible).
Other Enhancements
- Reduce repository size by moving the doxygen binary to doc repository.
- Minor reduction in
cub::BlockScan
instruction counts.
Bug Fixes
- Issue #55: Warning in
cub/device/dispatch/dispatch_reduce_by_key.cuh
. - Issue #59:
cub::DeviceScan::ExclusiveSum
can't prefix sum of float into double. - Issue #58: Infinite loop in
cub::CachingDeviceAllocator::NearestPowerOf
. - Issue #47:
cub::CachingDeviceAllocator
needs to clean up CUDA global error state upon successful retry. - Issue #46: Very high amount of needed memory from the
cub::DeviceHistogram::HistogramEven
. - Issue #45:
cub::CachingDeviceAllocator
fails with debug output enabled