Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

CUB 1.4.0

Compare
Choose a tag to compare
@brycelelbach brycelelbach released this 19 May 08:32

Summary

CUB 1.4.0 adds cub::DeviceSpmv, cub::DeviceRunLength::NonTrivialRuns, improves cub::DeviceHistogram, and introduces support for SM5x (Maxwell) GPUs.

New Features:

  • cub::DeviceSpmv methods for multiplying sparse matrices by dense vectors, load-balanced using a merge-based parallel decomposition.
  • cub::DeviceRadixSort sorting entry-points that always return the sorted output into the specified buffer, as opposed to the cub::DoubleBuffer in which it could end up in either buffer.
  • cub::DeviceRunLengthEncode::NonTrivialRuns for finding the starting offsets and lengths of all non-trivial runs (i.e., length > 1) of keys in a given sequence. Useful for top-down partitioning algorithms like MSD sorting of very-large keys.

Other Enhancements

  • Support and performance tuning for SM5x (Maxwell) GPUs.
  • Updated cub::DeviceHistogram implementation that provides the same "histogram-even" and "histogram-range" functionality as IPP/NPP. Provides extremely fast and, perhaps more importantly, very uniform performance response across diverse real-world datasets, including pathological (homogeneous) sample distributions.