Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

CUB 1.6.0 (previously 1.5.3)

Compare
Choose a tag to compare
@brycelelbach brycelelbach released this 19 May 08:44

Summary

CUB 1.6.0 changes the scan and reduce interfaces. Exclusive scans now accept an "initial value" instead of an "identity value". Scans and reductions now support differing input and output sequence types. Additionally, many bugs have been fixed.

Breaking Changes

  • Device/block/warp-wide exclusive scans have been revised to now accept an "initial value" (instead of an "identity value") for seeding the computation with an arbitrary prefix.
  • Device-wide reductions and scans can now have input sequence types that are different from output sequence types (as long as they are convertible).

Other Enhancements

  • Reduce repository size by moving the doxygen binary to doc repository.
  • Minor reduction in cub::BlockScan instruction counts.

Bug Fixes

  • Issue #55: Warning in cub/device/dispatch/dispatch_reduce_by_key.cuh.
  • Issue #59: cub::DeviceScan::ExclusiveSum can't prefix sum of float into double.
  • Issue #58: Infinite loop in cub::CachingDeviceAllocator::NearestPowerOf.
  • Issue #47: cub::CachingDeviceAllocator needs to clean up CUDA global error state upon successful retry.
  • Issue #46: Very high amount of needed memory from the cub::DeviceHistogram::HistogramEven.
  • Issue #45: cub::CachingDeviceAllocator fails with debug output enabled