Release v0.3.0 · NVIDIA/MatX

v0.3.0 marks a major release with over 100 features and bug fixes. Release cadence will occur more frequently after this release to support users not living at the HEAD.

What's Changed

Added squeeze operator by @cliffburdick in #163
Change name of squeeze to flatten by @cliffburdick in #164
Updated version of cuTENSOR and fixed paths by @cliffburdick in #166
Added reduction example with einsum by @cliffburdick in #168
Fixed bug with wrong type on argmin/max by @cliffburdick in #170
Fixed missing return on operator() for sum by @cliffburdick in #171
Fixed error with reduction with invalid indices. Only shows up on Jetson by @cliffburdick in #172
Fixed bug with matmul use-after-free by @cliffburdick in #173
Added test for batches GEMMs by @cliffburdick in #174
Throw an exception if using SetVals on non-managed pointer by @cliffburdick in #176
Added missing assert in release mode by @cliffburdick in #178
Fixed einsum in release mode by @cliffburdick in #179
Updates to docs by @cliffburdick in #180
Added unit test for transpose and fixed bug with grid size by @cliffburdick in #181
Fix grid dimensions for transpose. by @galv in #182
Added missing include by @cliffburdick in #184
Remove CUB from sum reduction while bug is being investigated by @cliffburdick in #186
Fix for cub reductions by @luitjens in #187
Reenable CUB tests by @cliffburdick in #188
Fixing incorrect parameter to CUB sort for 2D tensors by @cliffburdick in #190
Remove 4D restriction on Clone by @cliffburdick in #191
Added support for N-D convolutions by @cliffburdick in #189
Download RAPIDS.cmake only if it does not exist. by @cwharris in #192
Fix 11.4 compilation issues by @cliffburdick in #195
Improve FFT batching by @cliffburdick in #196
Fixed argmax initialization value by @cliffburdick in #198
Fix issue #199 by @pkestene in #200
Fix type on concatenate by @cliffburdick in #201
Fix documentation type-o by @dagardner-nv in #202
Missing host annotation on some generators by @cliffburdick in #203
Fixed TotalSize on cub operators by @cliffburdick in #204
Implementing remap operator. by @luitjens in #205
Update reverse/shift APIs by @luitjens in #207
batching conv1d across filters. by @luitjens in #208
Added Print for operators by @cliffburdick in #211
Complex div by @cliffburdick in #213
Added lcollapse and rcollapse operator by @luitjens in #212
Baseops by @luitjens in #214
Only allow View() on contigious tensors. by @luitjens in #215
Remove caching on some CUB types temporarily by @cliffburdick in #216
Fixed convolution mode SAME and added unit tests by @cliffburdick in #217
Added convolution VALID support by @cliffburdick in #218
Allow operators on cumsum by @cliffburdick in #219
Using async allocation in median() by @cliffburdick in #220
Various CUB fixes -- got rid of offset pointers (async allocation + copy), allowed operators on more types, and fixed caching on sort by @cliffburdick in #222
Fixed memory leak on CUB cache bypass by @cliffburdick in #223
Update to pipe type through for scalars on set operation by @tylera-nvidia in #225
Added complex version of mean and variance by @cliffburdick in #227
Fixed FFT batching for non-contiguous tensors by @cliffburdick in #228
Added fmod operator by @cliffburdick in #230
Fmod by @cliffburdick in #231
Changing name to fmod by @cliffburdick in #232
Cloneop by @luitjens in #233
Making the shift parameter in shift an operator by @luitjens in #234
Change sign of shift to match python/matlab. by @luitjens in #235
Changing output operator type to by value to allow temporary operators to be used as an output type. by @luitjens in #236
Adding slice() operator. by @luitjens in #237
Fix cuTensorNet workspace size by @leofang in #241
adding permute operator by @luitjens in #239
Cleaning up operators/transforms. by @luitjens in #243
Rapids cmake no fetch by @cliffburdick in #245
Cleanup of include directory by @luitjens in #246
Fixed conv SAME mode by @cliffburdick in #248
Use singleton on GIL interpreter by @cliffburdick in #249
make owning a runtime parameter by @luitjens in #247
Fixed bug with batched 1D convoultion size by @cliffburdick in #250
Adding 2d convolution tests by @luitjens in #251
Properly initialize pybind object by @cliffburdick in #252
Fixed sum() using wrong iterator type by @cliffburdick in #253
g++11 fixes by @cliffburdick in #254
Fixed size on conv and added benchmarks by @cliffburdick in #256
Adding unit tests for collapse with remap by @luitjens in #255
Collapse tests by @luitjens in #257
adding madd function to improve convolution throughput by @luitjens in #258
Conv opt by @luitjens in #259
Fixed compiler errors in release mode by @cliffburdick in #261
Add streaming make_tensor APIs. by @luitjens in #262
adding random benchmark by @luitjens in #264
remove depricated APIs in make_tensor by @luitjens in #266
Host unit tests by @luitjens in #267
Fixed bug with FFT size shorter than length of tensor by @cliffburdick in #270
removing unused pybind call made before pybind initialize by @tylera-nvidia in #271
Fixed visualization tests by @cliffburdick in #275
Fix cmake function check_python_libs. by @pkestene in #274
Support CubSortSegmented by @tylera-nvidia in #272
Executor cleanup. by @luitjens in #277
Transpose operators changes by @luitjens in #278
Remove Deprecated Shape and add metadata to Print by @tylera-nvidia in #280
Update Documentation by @tylera-nvidia in #282
NVTX Macros by @tylera-nvidia in #276
Adding throw to file reading by @tylera-nvidia in #281
Adding str() function to generators and operators by @luitjens in #283
Added reshape op by @luitjens in #287
0D tensor printing was broken since they don't have a stride by @cliffburdick in #289
Allow hermitian to take any rank by @cliffburdick in #292
Hermitian nd by @cliffburdick in #293
Fixed batched inverse by @cliffburdick in #294
Added 4D matmul unit test and fixed batching bug by @cliffburdick in #297
Fixing batched half precision complex GEMM by @cliffburdick in #298
Rename simple_pipeline to simple_radar_pipeline for added clarity by @awthomp in #299
Remove cuda::std::min/max by @cliffburdick in #301
Fixed chained concatenations by @cliffburdick in #302
Multiple concat by @cliffburdick in #303
Meshgrid API Changes by @luitjens in #304
Fixing slice operator not working with operator inputs by @cliffburdick in #306
Adding reduce apis that take reduction dims by @luitjens in #308
Median nd by @luitjens in #309
added 4d reduction benchmark by @luitjens in #312
Change Semantics of collapse APIs by @luitjens in #313
Change from CUDA_CC to CUDA_ARCH by @cliffburdick in #316
bug fix in get_grid_dims when z dim was > 64 by @luitjens in #318
Support axis selection and operators in FFTs by @luitjens in #319
added missing consts by @luitjens in #320
clone/slice op optimization: Return a tensor when passed in a tensor. by @luitjens in #321
adding axis parameter to convolution and correlation APIs by @luitjens in #322
Adding axis support to gemms. by @luitjens in #323
Added tests for matrix inverse using LU by @cliffburdick in #327
Added batched inverse tests by @cliffburdick in #328
Fix calculation for pulses/channel/sec by @awthomp in #329
Fixed reading complex CSVs by @cliffburdick in #330
Fixed Python dtypes to use objects instead of string by @cliffburdick in #331
Adding op and unsupported tensor shape support to matmul. by @luitjens in #326
Fixed dangling reference found by valgrind by @cliffburdick in #333
Legendre by @luitjens in #332
Added tensor swap() by @cliffburdick in #336
Added sph2cart and cart2sph by @luitjens in #334
fixes for complex add and sub and fixes for their unit tests by @tylera-nvidia in #339
Cgsolve by @luitjens in #337
fix for real - complex by @tylera-nvidia in #341
fixing some valgrind warnings by @luitjens in #342
Leg2 by @luitjens in #340
adding more error checking for operator sizes by @luitjens in #343
Fix typos in documentation by @Yaraslaut in #344
Fixed errors related to CUDA 12 update by @cliffburdick in #347
make concat axis a runtime parameter instead of compile time. by @luitjens in #348
adding stack operator by @luitjens in #350
Convolution overhaul. by @luitjens in #351
Added complex support to SVD by @cliffburdick in #354
Optimizing conv2d by @luitjens in #352
Added softmax operator by @cliffburdick in #355
Fix by justin for expanded dims not working in certain cases by @cliffburdick in #356
Added test for strided batched gemm and updated docs by @cliffburdick in #357
Updates for Offline Deployment and General Bug/QoL Fixes by @tylera-nvidia in #349

New Contributors

@galv made their first contribution in #182
@luitjens made their first contribution in #187
@cwharris made their first contribution in #192
@pkestene made their first contribution in #200
@dagardner-nv made their first contribution in #202
@tylera-nvidia made their first contribution in #225
@leofang made their first contribution in #241
@Yaraslaut made their first contribution in #344

Full Changelog: v0.2.5...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

What's Changed

New Contributors

Contributors