v0.4.0
New Features
- slice optimization to use builtin tensor function when possible by @luitjens in #360
- Slice support for std::array shapes by @luitjens in #363
- svd power iteration example, benchmark and unit tests. by @luitjens in #366
- matmul: support real/complex tensors by @kshitij12345 in #362
- Adding sign/index operators: by @luitjens in #369
- optimized cast and conj op to return a tensor view when possible. by @luitjens in #371
- implement QR for small batched matrices. by @luitjens in #373
- Implement block power iteration (qr iterations) for svd by @luitjens in #375
- Added output iterator support for CUB sums, and converted all sum() by @cliffburdick in #380
- Removing inheritance from std::iterator by @cliffburdick in #381
- DLPack support by @cliffburdick in #392
- Adding ref-count for DLPack by @cliffburdick in #394
- updating cub optimization selection for >= 2.0 by @tylera-nvidia in #395
- Refactored make_tensor to allow lvalue init by @cliffburdick in #397
- Updated notebook documentation and refactored some code by @cliffburdick in #398
- Allow 0-stride dimensions for cublas input/output by @tbensonatl in #400
- 16-bit float reductions + updated softmax by @cliffburdick in #399
Bug Fixes
- Fix Duplicate Print and remove member prints by @tylera-nvidia in #364
- cublasLT col major detection fix. by @luitjens in #368
- Fixes for 32b mode by @cliffburdick in #388
- Fixed a bogus maybe-unitialized warning/error in release mode by @cliffburdick in #389
- Fixed issue with using const pointers by @cliffburdick in #393
- Generator Printing Patch by @tylera-nvidia in #370
New Contributors
- @kshitij12345 made their first contribution in #362
- @tbensonatl made their first contribution in #400
Full Changelog: v0.3.0...v0.4.0