- ArrayFire now supports threaded applications. 1
- Added Canny edge detector. 1
- Added Sparse-Dense arithmetic operations. 1
- ArrayFire Threading
- \ref af::array can be read by multiple threads
- All ArrayFire functions can be executed concurrently by multiple threads
- Threads can operate on different devices to simplify Muli-device workloads
- New Canny edge detector function, \ref af::canny().
1
- Can automatically calculate high threshold with
AF_CANNY_THRESHOLD_AUTO_OTSU
- Supports both L1 and L2 Norms to calculate gradients
- Can automatically calculate high threshold with
- New tuned OpenCL BLAS backend, CLBlast.
- Converted CUDA JIT to use NVRTC instead of NVVM.
- Performance improvements in \ref af::reorder(). 1
- Performance improvements in \ref array::scalar(). 1
- Improved unified backend performance. 1
- ArrayFire now depends on Forge v1.0. 1
- Can now specify the FFT plan cache size using the \ref af::setFFTPlanCacheSize() function.
- Get the number of physical bytes allocated by the memory manager
\ref
af_get_allocated_bytes()
. 1 - \ref af::dot() can now return a scalar value to the host. 1
- Fixed improper release of default Mersenne random engine. 1
- Fixed \ref af::randu() and \ref af::randn() ranges for floating point types. 1
- Fixed assignment bug in CPU backend. 1
- Fixed complex (
c32
,c64
) multiplication in OpenCL convolution kernels. 1 - Fixed inconsistent behavior with \ref af::replace() and \ref replace_scalar(). 1
- Fixed memory leak in \ref af_fir(). 1
- Fixed memory leaks in \ref af_cast for sparse arrays. 1
- Fixing correctness of \ref af_pow for complex numbers by using Cartesian form. 1
- Corrected \ref af::select() with indexing in CUDA and OpenCL backends. 1
- Workaround for VS2015 compiler ternary bug. 1
- Fixed memory corruption in
cuda::findPlan()
. 1 - Argument checks in \ref af_create_sparse_array avoids inputs of type int64. 1
- Fixed issue with indexing an array with a step size != 1. 1
- On OSX, utilize new GLFW package from the brew package manager. 1 2
- Fixed CUDA PTX names generated by CMake v3.7. 1
- Support
gcc
> 5.x for CUDA. 1
- New genetic algorithm example. 1
- Updated
README.md
to improve readability and formatting. 1 - Updated
README.md
to mention Julia and Nim wrappers. 1 - Improved installation instructions -
docs/pages/install.md
. 1
- Windows
- The Windows NVIDIA driver version
37x.xx
contains a bug which causesfftconvolve_opencl
to fail. Upgrade or downgrade to a different version of the driver to avoid this failure. - The following tests fail on Windows with NVIDIA hardware:
threading_cuda
,qr_dense_opencl
,solve_dense_opencl
.
- The Windows NVIDIA driver version
- macOS
- The Accelerate framework, used by the CPU backend on macOS, leverages Intel
graphics cards (Iris) when there are no discrete GPUs available. This OpenCL
implementation is known to give incorrect results on the following tests:
lu_dense_{cpu,opencl}
,solve_dense_{cpu,opencl}
,inverse_dense_{cpu,opencl}
. - Certain tests intermittently fail on macOS with NVIDIA GPUs apparently due
to inconsistent driver behavior:
fft_large_cuda
andsvd_dense_cuda
. - The following tests are currently failing on macOS with AMD GPUs:
cholesky_dense_opencl
andscan_by_key_opencl
.
- The Accelerate framework, used by the CPU backend on macOS, leverages Intel
graphics cards (Iris) when there are no discrete GPUs available. This OpenCL
implementation is known to give incorrect results on the following tests:
This release supports CUDA 6.5 and higher. The next ArrayFire relase will support CUDA 7.0 and higher, dropping support for CUDA 6.5. Reasons for no longer supporting CUDA 6.5 include:
- CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which is used by ArrayFire's CPU and OpenCL backends.
- Very few ArrayFire users still use CUDA 6.5.
As a result, the older Jetson TK1 / Tegra K1 will no longer be supported in the next ArrayFire release. The newer Jetson TX1 / Tegra X1 will continue to have full capability with ArrayFire.
- Implemented sparse storage format conversions between \ref AF_STORAGE_CSR
and \ref AF_STORAGE_COO.
1
- Directly convert between \ref AF_STORAGE_COO <--> \ref AF_STORAGE_CSR using the af::sparseConvertTo() function.
- af::sparseConvertTo() now also supports converting to dense.
- Added cast support for [sparse arrays](\ref sparse_func).
1
- Casting only changes the values array and the type. The row and column index arrays are not changed.
- Reintroduced automated computation of chart axes limits for graphics functions.
1
- The axes limits will always be the minimum/maximum of the current and new limit.
- The user can still set limits from API calls. If the user sets a limit from the API call, then the automatic limit setting will be disabled.
- Using
boost::scoped_array
instead ofboost::scoped_ptr
when managing array resources. 1 - Internal performance improvements to getInfo() by using
const
references to avoid unnecessary copying ofArrayInfo
objects. 1 - Added support for scalar af::array inputs for af::convolve() and [set functions](\ref set_mat). 1 2 3
- Performance fixes in af::fftConvolve() kernels. 1 2
- Fixes to JIT when tree is large. 1 2
- Fixed indexing bug when converting dense to sparse af::array as \ref AF_STORAGE_COO. 1
- Fixed af::bilateral() OpenCL kernel compilation on OS X. 1
- Fixed memory leak in af::regions() (CPU) and af::rgb2ycbcr(). 1 2 3
- Major OS X installer fixes.
1
- Fixed installation scripts.
- Fixed installation symlinks for libraries.
- Windows installer now ships with more pre-built examples.
- Added af::choleskyInPlace() calls to
cholesky.cpp
example. 1
- CUDA 8.0.55 supports Xcode 8. 1
- Known failures with CUDA 6.5. These include all functions that use sorting. As a result, sparse storage format conversion between \ref AF_STORAGE_COO and \ref AF_STORAGE_CSR has been disabled for CUDA 6.5.
- Installers for Linux, OS X and Windows
- CUDA backend now uses CUDA 8.0.
- Uses Intel MKL 2017.
- CUDA Compute 2.x (Fermi) is no longer compiled into the library.
- Installer for OS X
- The libraries shipping in the OS X Installer are now compiled with Apple Clang v7.3.1 (previously v6.1.0).
- The OS X version used is 10.11.6 (previously 10.10.5).
- Installer for Jetson TX1 / Tegra X1
- Requires JetPack for L4T 2.3 (containing Linux for Tegra r24.2 for TX1).
- CUDA backend now uses CUDA 8.0 64-bit.
- Using CUDA's cusolver instead of CPU fallback.
- Uses OpenBLAS for CPU BLAS.
- All ArrayFire libraries are now 64-bit.
- Add [sparse array](\ref sparse_func) support to \ref af::eval(). 1
- Add OpenCL-CPU fallback support for sparse \ref af::matmul() when running on a unified memory device. Uses MKL Sparse BLAS.
- When using CUDA libdevice, pick the correct compute version based on device. 1
- OpenCL FFT now also supports prime factors 7, 11 and 13. 1 2
- Allow CUDA libdevice to be detected from custom directory.
- Fix
aarch64
detection on Jetson TX1 64-bit OS. 1 - Add missing definition of
af_set_fft_plan_cache_size
in unified backend. 1 - Fix intial values for \ref af::min() and \ref af::max() operations. 1 2
- Fix distance calculation in \ref af::nearestNeighbour for CUDA and OpenCL backend. 1 2
- Fix OpenCL bug where scalars where are passed incorrectly to compile options. 1
- Fix bug in \ref af::Window::surface() with respect to dimensions and ranges. 1
- Fix possible double free corruption in \ref af_assign_seq(). 1
- Add missing eval for key in \ref af::scanByKey in CPU backend. 1
- Fixed creation of sparse values array using \ref AF_STORAGE_COO. 1 1
- Add a [Conjugate Gradient solver example](\ref benchmarks/cg.cpp) to demonstrate sparse and dense matrix operations. 1
- When using CUDA 8.0,
compute 2.x are no longer in default compute list.
- This follows CUDA 8.0 deprecating computes 2.x.
- Default computes for CUDA 8.0 will be 30, 50, 60.
- When using CUDA pre-8.0, the default selection remains 20, 30, 50.
- CUDA backend now uses
-arch=sm_30
for PTX compilation as default.- Unless compute 2.0 is enabled.
- \ref af::lu() on CPU is known to give incorrect results when built run on
OS X 10.11 or 10.12 and compiled with Accelerate Framework.
1
- Since the OS X Installer libraries uses MKL rather than Accelerate Framework, this issue does not affect those libraries.
- [Sparse Matrix and BLAS](\ref sparse_func). 1 2
- Faster JIT for CUDA and OpenCL. 1 2
- Support for [random number generator engines](\ref af::randomEngine). 1 2
- Improvements to graphics. 1 2
- [Sparse Matrix and BLAS](\ref sparse_func) 1
2
- Support for [CSR](\ref AF_STORAGE_CSR) and [COO](\ref AF_STORAGE_COO) [storage types](\ref af_storage).
- Sparse-Dense Matrix Multiplication and Matrix-Vector Multiplication as a part of af::matmul() using \ref AF_STORAGE_CSR format for sparse.
- Conversion to and from [dense](\ref AF_STORAGE_DENSE) matrix to [CSR](\ref AF_STORAGE_CSR) and [COO](\ref AF_STORAGE_COO) [storage types](\ref af_storage).
- Faster JIT 1
2
- Performance improvements for CUDA and OpenCL JIT functions.
- Support for evaluating multiple outputs in a single kernel. See af::array::eval() for more.
- [Random Number Generation](\ref af::randomEngine)
1
2
- af::randomEngine(): A random engine class to handle setting the type and seed for random number generator engines.
- Supported engine types are (\ref af_random_engine_type):
- Graphics 1
2
- Using Forge v0.9.0
- [Vector Field](\ref af::Window::vectorField) plotting functionality. 1
- Removed GLEW and replaced with glbinding.
- Removed usage of GLEW after support for MX (multithreaded) was dropped in v2.0. 1
- Multiple overlays on the same window are now possible.
- Overlays support for same type of object (2D/3D)
- Supported by af::Window::plot, af::Window::hist, af::Window::surface, af::Window::vectorField.
- New API to set axes limits for graphs.
- Draw calls do not automatically compute the limits. This is now under user control.
- af::Window::setAxesLimits can be used to set axes limits automatically or manually.
- af::Window::setAxesTitles can be used to set axes titles.
- New API for plot and scatter:
- af::Window::plot() and af::Window::scatter() now can handle 2D and 3D and determine appropriate order.
- af_draw_plot_nd()
- af_draw_plot_2d()
- af_draw_plot_3d()
- af_draw_scatter_nd()
- af_draw_scatter_2d()
- af_draw_scatter_3d()
- New [interpolation methods](\ref af_interp_type)
1
- Applies to
- \ref af::resize()
- \ref af::transform()
- \ref af::approx1()
- \ref af::approx2()
- Applies to
- Support for [complex mathematical functions](\ref mathfunc_mat)
1
- Add complex support for \ref trig_mat, \ref af::sqrt(), \ref af::log().
- af::medfilt1(): Median filter for 1-d signals 1
- Generalized scan functions: \ref scan_func_scan and \ref scan_func_scanbykey
- Now supports inclusive or exclusive scans
- Supports binary operations defined by \ref af_binary_op. 1
- [Image Moments](\ref moments_mat) functions 1
- Add af::getSizeOf() function for \ref af_dtype 1
- Explicitly extantiate \ref af::array::device() for `void * 1
- Fixes to edge-cases in \ref morph_mat. 1
- Makes JIT tree size consistent between devices. 1
- Delegate higher-dimension in \ref convolve_mat to correct dimensions. 1
- Indexing fixes with C++11. 1 2
- Handle empty arrays as inputs in various functions. 1
- Fix bug when single element input to af::median. 1
- Fix bug in calculation of time from af::timeit(). 1
- Fix bug in floating point numbers in af::seq. 1
- Fixes for OpenCL graphics interop on NVIDIA devices. 1
- Fix bug when compiling large kernels for AMD devices. 1
- Fix bug in af::bilateral when shared memory is over the limit. 1
- Fix bug in kernel header compilation tool
bin2cpp
. 1 - Fix inital values for \ref morph_mat functions. 1
- Fix bugs in af::homography() CPU and OpenCL kernels. 1
- Fix bug in CPU TNJ. 1
- CUDA 8 and compute 6.x(Pascal) support, current installer ships with CUDA 7.5. 1 2 3
- User controlled FFT plan caching. 1
- CUDA performance improvements for \ref image_func_wrap, \ref image_func_unwrap and \ref approx_mat. 1
- Fallback for CUDA-OpenGL interop when no devices does not support OpenGL. 1
- Additional forms of batching with the \ref transform_func_transform functions. New behavior defined here. 1
- Update to OpenCL2 headers. 1
- Support for integration with external OpenCL contexts. 1
- Performance improvements to interal copy in CPU Backend. 1
- Performance improvements to af::select and af::replace CUDA kernels. 1
- Enable OpenCL-CPU offload by default for devices with Unified Host Memory.
1
- To disable, use the environment variable
AF_OPENCL_CPU_OFFLOAD=0
.
- To disable, use the environment variable
- Compilation speedups. 1
- Build fixes with MKL. 1
- Error message when CMake CUDA Compute Detection fails. 1
- Several CMake build issues with Xcode generator fixed. 1 2
- Fix multiple OpenCL definitions at link time. 1
- Fix lapacke detection in CMake. 1
- Update build tags of
- Fix builds with GCC 6.1.1 and GCC 5.3.0. 1
- All installers now ship with ArrayFire libraries build with MKL 2016.
- All installers now ship with Forge development files and examples included.
- CUDA Compute 2.0 has been removed from the installers. Please contact us directly if you have a special need.
- Added [example simulating gravity](\ref graphics/field.cpp) for demonstration of vector field.
- Improvements to \ref financial/black_scholes_options.cpp example.
- Improvements to \ref graphics/gravity_sim.cpp example.
- Fix graphics examples to use af::Window::setAxesLimits and af::Window::setAxesTitles functions.
- ArrayFire copyright and trademark policy
- Fixed grammar in license.
- Add license information for glbinding.
- Remove license infomation for GLEW.
- Random123 now applies to all backends.
- Random number functions are now under \ref random_mat.
The following functions have been deprecated and may be modified or removed permanently from future versions of ArrayFire.
- \ref af::Window::plot3(): Use \ref af::Window::plot instead.
- \ref af_draw_plot(): Use \ref af_draw_plot_nd or \ref af_draw_plot_2d instead.
- \ref af_draw_plot3(): Use \ref af_draw_plot_nd or \ref af_draw_plot_3d instead.
- \ref af::Window::scatter3(): Use \ref af::Window::scatter instead.
- \ref af_draw_scatter(): Use \ref af_draw_scatter_nd or \ref af_draw_scatter_2d instead.
- \ref af_draw_scatter3(): Use \ref af_draw_scatter_nd or \ref af_draw_scatter_3d instead.
Certain CUDA functions are known to be broken on Tegra K1. The following ArrayFire tests are currently failing:
- assign_cuda
- harris_cuda
- homography_cuda
- median_cuda
- orb_cudasort_cuda
- sort_by_key_cuda
- sort_index_cuda
- Family of [Sort](\ref sort_mat) functions now support higher order dimensions.
- Improved performance of batched sort on dim 0 for all [Sort](\ref sort_mat) functions.
- [Median](\ref stat_func_median) now also supports higher order dimensions.
- Fixes to error handling in C++ API for binary functions.
- Fixes to external OpenCL context management.
- Fixes to JPEG_GREYSCALE for FreeImage versions <= 3.154.
- Fixed for non-float inputs to \ref af::rgb2gray().
- Disable CPU Async when building with GCC < 4.8.4.
- Add option to disable CPUID from CMake.
- More verbose message when CUDA Compute Detection fails.
- Print message to use CUDA library stub from CUDA Toolkit if CUDA Library is not found from default paths.
- Build Fixes on Windows.
- For compiling tests our of source.
- For compiling ArrayFire with static MKL.
- Exclude <sys/sysctl.h> when building on GNU Hurd.
- Add manual CMake options to build DEB and RPM packages.
- Fixed documentation for \ref af::replace().
- Fixed images in [Using on OSX](\ref using_on_osx) page.
- Linux x64 installers will now be compiled with GCC 4.9.2.
- OSX installer gives better error messages on brew failures and now includes link to [Fixing OS X Installer Failures] (https://github.com/arrayfire/arrayfire/wiki/Fixing-Common-OS-X-Installer-Failures) for brew installation failures.
- Fixes to \ref af::array::device()
- CPU Backend: evaluate arrays before returning pointer with asynchronous calls in CPU backend.
- OpenCL Backend: fix segfaults when requested for device pointers on empty arrays.
- Fixed \ref af::array::operator%() from using rem to mod.
- Fixed array destruction when backends are switched in Unified API.
- Fixed indexing after \ref af::moddims() is called.
- Fixes FFT calls for CUDA and OpenCL backends when used on multiple devices.
- Fixed unresolved external for some functions from \ref af::array::array_proxy class.
- CMake compiles files in alphabetical order.
- CMake fixes for BLAS and LAPACK on some Linux distributions.
- Fixed OpenCL FFT performance regression.
- \ref af::array::device() on OpenCL backend returns
cl_mem
instead of(void*)cl::Buffer*
. - In Unified backend, load versioned libraries at runtime.
- Reorganized, cleaner README file.
- Replaced non-free lena image in assets with free-to-distribute lena image.
- CPU backend supports aysnchronous execution.
- Performance improvements to OpenCL BLAS and FFT functions.
- Improved performance of memory manager.
- Improvements to visualization functions.
- Improved sorted order for OpenCL devices.
- Integration with external OpenCL projects.
- \ref af::getActiveBackend(): Returns the current backend being used.
- Scatter plot added to graphics.
- \ref af::transform() now supports perspective transformation matrices.
- \ref af::infoString(): Returns
af::info()
as a string. - \ref af::printMemInfo(): Print a table showing information about buffer from the memory manager
- The \ref AF_MEM_INFO macro prints numbers and total sizes of all buffers (requires including af/macros.h)
- \ref af::allocHost(): Allocates memory on host.
- \ref af::freeHost(): Frees host side memory allocated by arrayfire.
- OpenCL functions can now use CPU implementation.
- Currently limited to Unified Memory devices (CPU and On-board Graphics).
- Functions: af::matmul() and all [LAPACK](\ref linalg_mat) functions.
- Takes advantage of optimized libraries such as MKL without doing memory copies.
- Use the environment variable
AF_OPENCL_CPU_OFFLOAD=1
to take advantage of this feature.
- Functions specific to OpenCL backend.
- \ref afcl::addDevice(): Adds an external device and context to ArrayFire's device manager.
- \ref afcl::deleteDevice(): Removes an external device and context from ArrayFire's device manager.
- \ref afcl::setDevice(): Sets an external device and context from ArrayFire's device manager.
- \ref afcl::getDeviceType(): Gets the device type of the current device.
- \ref afcl::getPlatform(): Gets the platform of the current device.
- \ref af::createStridedArray() allows array creation user-defined strides and device pointer.
- Expose functions that provide information
about memory layout of Arrays.
- \ref af::getStrides(): Gets the strides for each dimension of the array.
- \ref af::getOffset(): Gets the offsets for each dimension of the array.
- \ref af::getRawPtr(): Gets raw pointer to the location of the array on device.
- \ref af::isLinear(): Returns true if all elements in the array are contiguous.
- \ref af::isOwner(): Returns true if the array owns the raw pointer, false if it is a sub-array.
- \ref af::getStrides(): Gets the strides of the array.
- \ref af::getStrides(): Gets the strides of the array.
- \ref af::getDeviceId(): Gets the device id on which the array resides.
- \ref af::isImageIOAvailable(): Returns true if ArrayFire was compiled with Freeimage enabled
- \ref af::isLAPACKAvailable(): Returns true if ArrayFire was compiled with LAPACK functions enabled
- Fixed errors when using 3D / 4D arrays in select and replace
- Fixed JIT errors on AMD devices for OpenCL backend.
- Fixed imageio bugs for 16 bit images.
- Fixed bugs when loading and storing images natively.
- Fixed bug in FFT for NVIDIA GPUs when using OpenCL backend.
- Fixed bug when using external context with OpenCL backend.
- Fixed memory leak in \ref af_median_all().
- Fixed memory leaks and performance in graphics functions.
- Fixed bugs when indexing followed by moddims.
- \ref af_get_revision() now returns actual commit rather than AF_REVISION.
- Fixed releasing arrays when using different backends.
- OS X OpenCL: [LAPACK functions](\ref linalg_mat) on CPU devices use OpenCL offload (previously threw errors).
- Add support for 32-bit integer image types in Image IO.
- Fixed set operations for row vectors
- Fixed bugs in \ref af::meanShift() and af::orb().
- Optionally offload BLAS and LAPACK functions to CPU implementations to improve performance.
- Performance improvements to the memory manager.
- Error messages are now more detailed.
- Improved sorted order for OpenCL devices.
- JIT heuristics can now be tweaked using environment variables. See [Environment Variables](\ref configuring_environment) tutorial.
- Add
BUILD_<BACKEND>
options to examples and tests to toggle backends when compiling independently.
- New visualization [example simulating gravity](\ref graphics/gravity_sim.cpp).
- Support for Intel
icc
compiler - Support to compile with Intel MKL as a BLAS and LAPACK provider
- Tests are now available for building as standalone (like examples)
- Tests can now be built as a single file for each backend
- Better handling of NONFREE build options
- Searching for GLEW in CMake default paths
- Fixes for compiling with MKL on OSX.
- Improvements to OSX Installer
- CMake config files are now installed with libraries
- Independent options for installing examples and documentation components
af_lock_device_arr
is now deprecated to be removed in v4.0.0. Use \ref af_lock_array() instead.af_unlock_device_arr
is now deprecated to be removed in v4.0.0. use \ref af_unlock_array() instead.
- Fixes to documentation for \ref matchTemplate().
- Improved documentation for deviceInfo.
- Fixes to documentation for \ref exp().
- Solve OpenCL fails on NVIDIA Maxwell devices for f32 and c32 when M > N and K % 4 is 1 or 2.
- Fixed memory leak in CUDA Random number generators
- Fixed bug in af::select() and af::replace() tests
- Fixed exception thrown when printing empty arrays with af::print()
- Fixed bug in CPU random number generation. Changed the generator to mt19937
- Fixed exception handling (internal)
- Exceptions now show function, short file name and line number
- Added AF_RETURN_ERROR macro to handle returning errors.
- Removed THROW macro, and renamed AF_THROW_MSG to AF_THROW_ERR.
- Fixed bug in \ref af::identity() that may have affected CUDA Compute 5.2 cards
- Added a MIN_BUILD_TIME option to build with minimum optimization compiler flags resulting in faster compile times
- Fixed issue in CBLAS detection by CMake
- Fixed tests failing for builds without optional components FreeImage and LAPACK
- Added a test for unified backend
- Only info and backend tests are now built for unified backend
- Sort tests execution alphabetically
- Fixed compilation flags and errors in tests and examples
- Moved AF_REVISION and AF_COMPILER_STR
into src/backend. This is because as revision is updated with every commit,
entire ArrayFire would have to be rebuilt in the old code.
- v3.3 will add a af_get_revision() function to get the revision string.
- Clean up examples
- Remove getchar for Windows (this will be handled by the installer)
- Other miscellaneous code cleanup
- Fixed bug in [plot3.cpp](\ref graphics/plot3.cpp) example
- Rename clBLAS/clFFT external project suffix from external -> ext
- Add OpenBLAS as a lapack/lapacke alternative
- Added \ref AF_MEM_INFO macro to print memory info from ArrayFire's memory manager (cross issue)
- Added additional paths
for searching for
libaf*
for Unified backend on unix-style OS.- Note: This still requires dependencies such as forge, CUDA, NVVM etc to be
in
LD_LIBRARY_PATH
as described in [Unified Backend](\ref unifiedbackend)
- Note: This still requires dependencies such as forge, CUDA, NVVM etc to be
in
- Create streams for devices only when required in CUDA Backend
- Hide scrollbars appearing for pre and code styles
- Fix documentation for af::replace
- Add code sample for converting the output of af::getAvailableBackends() into bools
- Minor fixes in documentation
- Fixed bug in homography()
- Fixed bug in behavior of af::array::device()
- Fixed bug when indexing with span along trailing dimension
- Fixed bug when indexing in [GFor](\ref gfor)
- Fixed bug in CPU information fetching
- Fixed compilation bug in unified backend caused by missing link library
- Add missing symbol for af_draw_surface()
- Tests can now be used as a standalone project
- Tests can now be built using pre-compiled libraries
- Similar to how the examples are built
- The install target now installs the examples source irrespective of the
BUILD_EXAMPLES value
- Examples are not built if BUILD_EXAMPLES is off
- HTML documentation is now built and installed in docs/html
- Added documentation for \ref af::seq class
- Updated [Matrix Manipulation](\ref matrixmanipulation) tutorial
- Examples list is now generated by CMake
- Examples are now listed as dir/example.cpp
- Removed dummy groups used for indexing documentation (affcted doxygen < 1.8.9)
- Added Unified backend
- Allows switching backends at runtime
- Read [Unified Backend](\ref unifiedbackend) for more.
- Support for 16-bit integers (\ref s16 and \ref u16)
- All functions that support 32-bit interger types (\ref s32, \ref u32), now also support 16-bit interger types
-
Unified Backend
- \ref setBackend() - Sets a backend as active
- \ref getBackendCount() - Gets the number of backends available for use
- \ref getAvailableBackends() - Returns information about available backends
- \ref getBackendId() - Gets the backend enum for an array
-
Vision
- \ref homography() - Homography estimation
- \ref gloh() - GLOH Descriptor for SIFT
-
Image Processing
- \ref loadImageNative() - Load an image as native data without modification
- \ref saveImageNative() - Save an image without modifying data or type
-
Graphics
- \ref af::Window::plot3() - 3-dimensional line plot
- \ref af::Window::surface() - 3-dimensional curve plot
-
Indexing
- \ref af_create_indexers()
- \ref af_set_array_indexer()
- \ref af_set_seq_indexer()
- \ref af_set_seq_param_indexer()
- \ref af_release_indexers()
-
CUDA Backend Specific
- \ref setNativeId() - Set the CUDA device with given native id as active
- ArrayFire uses a modified order for devices. The native id for a
device can be retreived using
nvidia-smi
- ArrayFire uses a modified order for devices. The native id for a
device can be retreived using
- \ref setNativeId() - Set the CUDA device with given native id as active
-
OpenCL Backend Specific
- \ref setDeviceId() - Set the OpenCL device using the
clDeviceId
- \ref setDeviceId() - Set the OpenCL device using the
- Added \ref c32 and \ref c64 support for \ref isNaN(), \ref isInf() and \ref iszero()
- Added CPU information for
x86
andx86_64
architectures in CPU backend's \ref info() - Batch support for \ref approx1() and \ref approx2()
- Now can be used with gfor as well
- Added \ref s64 and \ref u64 support to:
- \ref sort() (along with sort index and sort by key)
- \ref setUnique(), \ref setUnion(), \ref setIntersect()
- \ref convolve() and \ref fftConvolve()
- \ref histogram() and \ref histEqual()
- \ref lookup()
- \ref mean()
- Added \ref AF_MSG macro
- Submodules update is now automatically called if not cloned recursively
- Fixes for compilation on Visual Studio 2015
- Option to use fallback to CPU LAPACK for linear algebra functions in case of CUDA 6.5 or older versions.
- Fixed memory leak in \ref susan()
- Fixed failing test in \ref lower() and \ref upper() for CUDA compute 53
- Fixed bug in CUDA for indexing out of bounds
- Fixed dims check in \ref iota()
- Fixed out-of-bounds access in \ref sift()
- Fixed memory allocation in \ref fast() OpenCL
- Fixed memory leak in image I/O functions
- \ref dog() now returns float-point type arrays
- Improved tutorials documentation
- More detailed Using on [Linux](\ref using_on_linux), [OSX](\ref using_on_osx), [Windows](\ref using_on_windows) pages.
- Added return type information for functions that return different type arrays
- Graphics
- [Plot3](\ref graphics/plot3.cpp)
- [Surface](\ref graphics/surface.cpp)
- [Shallow Water Equation](\ref pde/swe.cpp)
- [Basic](\ref unified/basic.cpp) as a Unified backend example
- All installers now include the Unified backend and corresponding CMake files
- Visual Studio projects include Unified in the Platform Configurations
- Added installer for Jetson TX1
- SIFT and GLOH do not ship with the installers as SIFT is protected by patents that do not allow commercial distribution without licensing.
- Fixed bugs in various OpenCL kernels without offset additions
- Remove ARCH_32 and ARCH_64 flags
- Fix missing symbols when freeimage is not found
- Use CUDA driver version for Windows
- Improvements to SIFT
- Fixed memory leak in median
- Fixes for Windows compilation when not using MKL #1047
- Fixed for building without LAPACK
- Documentation: Fixed documentation for select and replace
- Documentation: Fixed documentation for af_isnan
- Fixed bug in assign that was causing test to fail
- Fixed bug in convolve. Frequency condition now depends on kernel size only
- Fixed bug in indexed reductions for complex type in OpenCL backend
- Fixed bug in kernel name generation in ireduce for OpenCL backend
- Fixed non-linear to linear indices in ireduce
- Fixed bug in reductions for small arrays
- Fixed bug in histogram for indexed arrays
- Fixed compiler error CPUID for non-compliant devices
- Fixed failing tests on i386 platforms
- Add missing AFAPI
- Documentation: Added missing examples and other corrections
- Documentation: Fixed warnings in documentation building
- Installers: Send error messages to log file in OSX Installer
- CUDA backend now depends on CUDA 7.5 toolkit
- OpenCL backend now require OpenCL 1.2 or greater
cmake
now includesPKG_CONFIG
in the search path for CBLAS and LAPACKE libraries- [heston_model.cpp](\ref financial/heston_model.cpp) example now builds with the default ArrayFire cmake files after installation
- Fixed bug in [image_editing.cpp](\ref image_processing/image_editing.cpp)
-
Computer Vision Functions
- \ref nearestNeighbour() - Nearest Neighbour with SAD, SSD and SHD distances
- \ref harris() - Harris Corner Detector
- \ref susan() - Susan Corner Detector
- \ref sift() - Scale Invariant Feature Transform (SIFT)
- Method and apparatus for identifying scale invariant features" "in an image and use of same for locating an object in an image," David" "G. Lowe, US Patent 6,711,293 (March 23, 2004). Provisional application" "filed March 8, 1999. Asignee: The University of British Columbia. For" "further details, contact David Lowe ([email protected]) or the" "University-Industry Liaison Office of the University of British" "Columbia.")
- SIFT is available for compiling but does not ship with ArrayFire hosted installers/pre-built libraries
- \ref dog() - Difference of Gaussians
-
Image Processing Functions
- \ref ycbcr2rgb() and \ref rgb2ycbcr() - RGB <->YCbCr color space conversion
- \ref wrap() and \ref unwrap() Wrap and Unwrap
- \ref sat() - Summed Area Tables
- \ref loadImageMem() and \ref saveImageMem() - Load and Save images to/from memory
- \ref af_image_format - Added imageFormat (af_image_format) enum
-
Array & Data Handling
- \ref copy() - Copy
- array::lock() and array::unlock() - Lock and Unlock
- \ref select() and \ref replace() - Select and Replace
- Get array reference count (af_get_data_ref_count)
-
Signal Processing
- \ref fftInPlace() - 1D in place FFT
- \ref fft2InPlace() - 2D in place FFT
- \ref fft3InPlace() - 3D in place FFT
- \ref ifftInPlace() - 1D in place Inverse FFT
- \ref ifft2InPlace() - 2D in place Inverse FFT
- \ref ifft3InPlace() - 3D in place Inverse FFT
- \ref fftR2C() - Real to complex FFT
- \ref fftC2R() - Complex to Real FFT
-
Linear Algebra
- \ref svd() and \ref svdInPlace() - Singular Value Decomposition
-
Other operations
- \ref sigmoid() - Sigmoid
- Sum (with option to replace NaN values)
- Product (with option to replace NaN values)
-
Graphics
- Window::setSize() - Window resizing using Forge API
-
Utility
- Allow users to set print precision (print, af_print_array_gen)
- \ref saveArray() and \ref readArray() - Stream arrays to binary files
- \ref toString() - toString function returns the array and data as a string
-
CUDA specific functionality
- \ref getStream() - Returns default CUDA stream ArrayFire uses for the current device
- \ref getNativeId() - Returns native id of the CUDA device
- dot
- Allow complex inputs with conjugate option
- AF_INTERP_LOWER interpolation
- For resize, rotate and transform based functions
- 64-bit integer support
- For reductions, random, iota, range, diff1, diff2, accum, join, shift and tile
- convolve
- Support for non-overlapping batched convolutions
- Complex Arrays
- Fix binary ops on complex inputs of mixed types
- Complex type support for exp
- tile
- Performance improvements by using JIT when possible.
- Add AF_API_VERSION macro
- Allows disabling of API to maintain consistency with previous versions
- Other Performance Improvements
- Use reference counting to reduce unnecessary copies
- CPU Backend
- Device properties for CPU
- Improved performance when all buffers are indexed linearly
- CUDA Backend
- Use streams in CUDA (no longer using default stream)
- Using async cudaMem ops
- Add 64-bit integer support for JIT functions
- Performance improvements for CUDA JIT for non-linear 3D and 4D arrays
- OpenCL Backend
- Improve compilation times for OpenCL backend
- Performance improvements for non-linear JIT kernels on OpenCL
- Improved shared memory load/store in many OpenCL kernels (PR 933)
- Using cl.hpp v1.2.7
- Common
- Fix compatibility of c32/c64 arrays when operating with scalars
- Fix median for all values of an array
- Fix double free issue when indexing (30cbbc7)
- Fix bug in rank
- Fix default values for scale throwing exception
- Fix conjg raising exception on real input
- Fix bug when using conjugate transpose for vector input
- Fix issue with const input for array_proxy::get()
- CPU Backend
- Fix randn generating same sequence for multiple calls
- Fix setSeed for randu
- Fix casting to and from complex
- Check NULL values when allocating memory
- Fix offset issue for CPU element-wise operations
- Match Template
- Susan
- Heston Model (contributed by Michael Nowotny)
- Fixed bug in automatic detection of ArrayFire when using with CMake in Windows
- The Linux libraries are now compiled with static version of FreeImage
- OpenBlas can cause issues with QR factorization in CPU backend
- FreeImage older than 3.10 can cause issues with loadImageMem and saveImageMem
- OpenCL backend issues on OSX
- AMD GPUs not supported because of driver issues
- Intel CPUs not supported
- Linear algebra functions do not work on Intel GPUs.
- Stability and correctness issues with open source OpenCL implementations such as Beignet, GalliumCompute.
- Added missing symbols from the compatible API
- Fixed a bug affecting corner rows and elements in \ref grad()
- Fixed linear interpolation bugs affecting large images in the following:
- \ref approx1()
- \ref approx2()
- \ref resize()
- \ref rotate()
- \ref scale()
- \ref skew()
- \ref transform()
- Added missing documentation for \ref constant()
- Added missing documentation for
array::scalar()
- Added supported input types for functions in
arith.h
- Fixed header to work in Visual Studio 2015
- Fixed a bug in batched mode for FFT based convolutions
- Fixed graphics issues on OSX
- Fixed various bugs in visualization functions
- Improved fractal example
- New OSX installer
- Improved Windows installer
- Default install path has been changed
- Fixed bug in machine learning examples
- ArrayFire is now open source
- Major changes to the visualization library
- Introducing handle based C API
- New backend: CPU fallback available for systems without GPUs
- Dense linear algebra functions available for all backends
- Support for 64 bit integers
-
Data generation functions
- range()
- iota()
-
Computer Vision Algorithms
- features()
- A data structure to hold features
- fast()
- FAST feature detector
- orb()
- ORB A feature descriptor extractor
- features()
-
Image Processing
- convolve1(), convolve2(), convolve3()
- Specialized versions of convolve() to enable better batch support
- fftconvolve1(), fftconvolve2(), fftconvolve3()
- Convolutions in frequency domain to support larger kernel sizes
- dft(), idft()
- Unified functions for calling multi dimensional ffts.
- matchTemplate()
- Match a kernel in an image
- sobel()
- Get sobel gradients of an image
- rgb2hsv(), hsv2rgb(), rgb2gray(), gray2rgb()
- Explicit function calls to colorspace conversions
- erode3d(), dilate3d()
- Explicit erode and dilate calls for image morphing
- convolve1(), convolve2(), convolve3()
-
Linear Algebra
- matmulNT(), matmulTN(), matmulTT()
- Specialized versions of matmul() for transposed inputs
- luInPlace(), choleskyInPlace(), qrInPlace()
- In place factorizations to improve memory requirements
- solveLU()
- Specialized solve routines to improve performance
- OpenCL backend now Linear Algebra functions
- matmulNT(), matmulTN(), matmulTT()
-
Other functions
- lookup() - lookup indices from a table
- batchFunc() - helper function to perform batch operations
-
Visualization functions
- Support for multiple windows
- window.hist()
- Visualize the output of the histogram
-
C API
- Removed old pointer based C API
- Introducing handle base C API
- Just In Time compilation available in C API
- C API has feature parity with C++ API
- bessel functions removed
- cross product functions removed
- Kronecker product functions removed
- Improvements across the board for OpenCL backend
print
is now af_print()- seq(): The step parameter is now the third input
- seq(start, step, end) changed to seq(start, end, step)
- gfor(): The iterator now needs to be seq()
Deprecated APIs are in af/compatible.h
- devicecount() changed to getDeviceCount()
- deviceset() changed to setDevice()
- deviceget() changed to getDevice()
- loadimage() changed to loadImage()
- saveimage() changed to saveImage()
- gaussiankernel() changed to gaussianKernel()
- alltrue() changed to allTrue()
- anytrue() changed to anyTrue()
- setunique() changed to setUnique()
- setunion() changed to setUnion()
- setintersect() changed to setIntersect()
- histequal() changed to histEqual()
- colorspace() changed to colorSpace()
- filter() deprecated. Use convolve1() and convolve2()
- mul() changed to product()
- deviceprop() changed to deviceProp()
- OpenCL backend issues on OSX
- AMD GPUs not supported because of driver issues
- Intel CPUs not supported
- Linear algebra functions do not work on Intel GPUs.
- Stability and correctness issues with open source OpenCL implementations such as Beignet, GalliumCompute.