v0.3.0
v0.3.0 marks a major release with over 100 features and bug fixes. Release cadence will occur more frequently after this release to support users not living at the HEAD.
What's Changed
- Added squeeze operator by @cliffburdick in #163
- Change name of squeeze to flatten by @cliffburdick in #164
- Updated version of cuTENSOR and fixed paths by @cliffburdick in #166
- Added reduction example with einsum by @cliffburdick in #168
- Fixed bug with wrong type on argmin/max by @cliffburdick in #170
- Fixed missing return on operator() for sum by @cliffburdick in #171
- Fixed error with reduction with invalid indices. Only shows up on Jetson by @cliffburdick in #172
- Fixed bug with matmul use-after-free by @cliffburdick in #173
- Added test for batches GEMMs by @cliffburdick in #174
- Throw an exception if using SetVals on non-managed pointer by @cliffburdick in #176
- Added missing assert in release mode by @cliffburdick in #178
- Fixed einsum in release mode by @cliffburdick in #179
- Updates to docs by @cliffburdick in #180
- Added unit test for transpose and fixed bug with grid size by @cliffburdick in #181
- Fix grid dimensions for transpose. by @galv in #182
- Added missing include by @cliffburdick in #184
- Remove CUB from sum reduction while bug is being investigated by @cliffburdick in #186
- Fix for cub reductions by @luitjens in #187
- Reenable CUB tests by @cliffburdick in #188
- Fixing incorrect parameter to CUB sort for 2D tensors by @cliffburdick in #190
- Remove 4D restriction on Clone by @cliffburdick in #191
- Added support for N-D convolutions by @cliffburdick in #189
- Download RAPIDS.cmake only if it does not exist. by @cwharris in #192
- Fix 11.4 compilation issues by @cliffburdick in #195
- Improve FFT batching by @cliffburdick in #196
- Fixed argmax initialization value by @cliffburdick in #198
- Fix issue #199 by @pkestene in #200
- Fix type on concatenate by @cliffburdick in #201
- Fix documentation type-o by @dagardner-nv in #202
- Missing host annotation on some generators by @cliffburdick in #203
- Fixed TotalSize on cub operators by @cliffburdick in #204
- Implementing remap operator. by @luitjens in #205
- Update reverse/shift APIs by @luitjens in #207
- batching conv1d across filters. by @luitjens in #208
- Added Print for operators by @cliffburdick in #211
- Complex div by @cliffburdick in #213
- Added lcollapse and rcollapse operator by @luitjens in #212
- Baseops by @luitjens in #214
- Only allow View() on contigious tensors. by @luitjens in #215
- Remove caching on some CUB types temporarily by @cliffburdick in #216
- Fixed convolution mode SAME and added unit tests by @cliffburdick in #217
- Added convolution VALID support by @cliffburdick in #218
- Allow operators on cumsum by @cliffburdick in #219
- Using async allocation in median() by @cliffburdick in #220
- Various CUB fixes -- got rid of offset pointers (async allocation + copy), allowed operators on more types, and fixed caching on sort by @cliffburdick in #222
- Fixed memory leak on CUB cache bypass by @cliffburdick in #223
- Update to pipe type through for scalars on set operation by @tylera-nvidia in #225
- Added complex version of mean and variance by @cliffburdick in #227
- Fixed FFT batching for non-contiguous tensors by @cliffburdick in #228
- Added fmod operator by @cliffburdick in #230
- Fmod by @cliffburdick in #231
- Changing name to fmod by @cliffburdick in #232
- Cloneop by @luitjens in #233
- Making the shift parameter in shift an operator by @luitjens in #234
- Change sign of shift to match python/matlab. by @luitjens in #235
- Changing output operator type to by value to allow temporary operators to be used as an output type. by @luitjens in #236
- Adding slice() operator. by @luitjens in #237
- Fix cuTensorNet workspace size by @leofang in #241
- adding permute operator by @luitjens in #239
- Cleaning up operators/transforms. by @luitjens in #243
- Rapids cmake no fetch by @cliffburdick in #245
- Cleanup of include directory by @luitjens in #246
- Fixed conv SAME mode by @cliffburdick in #248
- Use singleton on GIL interpreter by @cliffburdick in #249
- make owning a runtime parameter by @luitjens in #247
- Fixed bug with batched 1D convoultion size by @cliffburdick in #250
- Adding 2d convolution tests by @luitjens in #251
- Properly initialize pybind object by @cliffburdick in #252
- Fixed sum() using wrong iterator type by @cliffburdick in #253
- g++11 fixes by @cliffburdick in #254
- Fixed size on conv and added benchmarks by @cliffburdick in #256
- Adding unit tests for collapse with remap by @luitjens in #255
- Collapse tests by @luitjens in #257
- adding madd function to improve convolution throughput by @luitjens in #258
- Conv opt by @luitjens in #259
- Fixed compiler errors in release mode by @cliffburdick in #261
- Add streaming make_tensor APIs. by @luitjens in #262
- adding random benchmark by @luitjens in #264
- remove depricated APIs in make_tensor by @luitjens in #266
- Host unit tests by @luitjens in #267
- Fixed bug with FFT size shorter than length of tensor by @cliffburdick in #270
- removing unused pybind call made before pybind initialize by @tylera-nvidia in #271
- Fixed visualization tests by @cliffburdick in #275
- Fix cmake function check_python_libs. by @pkestene in #274
- Support CubSortSegmented by @tylera-nvidia in #272
- Executor cleanup. by @luitjens in #277
- Transpose operators changes by @luitjens in #278
- Remove Deprecated Shape and add metadata to Print by @tylera-nvidia in #280
- Update Documentation by @tylera-nvidia in #282
- NVTX Macros by @tylera-nvidia in #276
- Adding throw to file reading by @tylera-nvidia in #281
- Adding str() function to generators and operators by @luitjens in #283
- Added reshape op by @luitjens in #287
- 0D tensor printing was broken since they don't have a stride by @cliffburdick in #289
- Allow hermitian to take any rank by @cliffburdick in #292
- Hermitian nd by @cliffburdick in #293
- Fixed batched inverse by @cliffburdick in #294
- Added 4D matmul unit test and fixed batching bug by @cliffburdick in #297
- Fixing batched half precision complex GEMM by @cliffburdick in #298
- Rename simple_pipeline to simple_radar_pipeline for added clarity by @awthomp in #299
- Remove cuda::std::min/max by @cliffburdick in #301
- Fixed chained concatenations by @cliffburdick in #302
- Multiple concat by @cliffburdick in #303
- Meshgrid API Changes by @luitjens in #304
- Fixing slice operator not working with operator inputs by @cliffburdick in #306
- Adding reduce apis that take reduction dims by @luitjens in #308
- Median nd by @luitjens in #309
- added 4d reduction benchmark by @luitjens in #312
- Change Semantics of collapse APIs by @luitjens in #313
- Change from CUDA_CC to CUDA_ARCH by @cliffburdick in #316
- bug fix in get_grid_dims when z dim was > 64 by @luitjens in #318
- Support axis selection and operators in FFTs by @luitjens in #319
- added missing consts by @luitjens in #320
- clone/slice op optimization: Return a tensor when passed in a tensor. by @luitjens in #321
- adding axis parameter to convolution and correlation APIs by @luitjens in #322
- Adding axis support to gemms. by @luitjens in #323
- Added tests for matrix inverse using LU by @cliffburdick in #327
- Added batched inverse tests by @cliffburdick in #328
- Fix calculation for pulses/channel/sec by @awthomp in #329
- Fixed reading complex CSVs by @cliffburdick in #330
- Fixed Python dtypes to use objects instead of string by @cliffburdick in #331
- Adding op and unsupported tensor shape support to matmul. by @luitjens in #326
- Fixed dangling reference found by valgrind by @cliffburdick in #333
- Legendre by @luitjens in #332
- Added tensor swap() by @cliffburdick in #336
- Added sph2cart and cart2sph by @luitjens in #334
- fixes for complex add and sub and fixes for their unit tests by @tylera-nvidia in #339
- Cgsolve by @luitjens in #337
- fix for real - complex by @tylera-nvidia in #341
- fixing some valgrind warnings by @luitjens in #342
- Leg2 by @luitjens in #340
- adding more error checking for operator sizes by @luitjens in #343
- Fix typos in documentation by @Yaraslaut in #344
- Fixed errors related to CUDA 12 update by @cliffburdick in #347
- make concat axis a runtime parameter instead of compile time. by @luitjens in #348
- adding stack operator by @luitjens in #350
- Convolution overhaul. by @luitjens in #351
- Added complex support to SVD by @cliffburdick in #354
- Optimizing conv2d by @luitjens in #352
- Added softmax operator by @cliffburdick in #355
- Fix by justin for expanded dims not working in certain cases by @cliffburdick in #356
- Added test for strided batched gemm and updated docs by @cliffburdick in #357
- Updates for Offline Deployment and General Bug/QoL Fixes by @tylera-nvidia in #349
New Contributors
- @galv made their first contribution in #182
- @luitjens made their first contribution in #187
- @cwharris made their first contribution in #192
- @pkestene made their first contribution in #200
- @dagardner-nv made their first contribution in #202
- @tylera-nvidia made their first contribution in #225
- @leofang made their first contribution in #241
- @Yaraslaut made their first contribution in #344
Full Changelog: v0.2.5...v0.3.0