Release v3.4.1 #1623

shehzan10 · 2016-10-14T15:13:31Z

v3.4.1

Installers

Installers for Linux, OS X and Windows
- CUDA backend now uses CUDA 8.0.
- Uses Intel MKL 2017.
- CUDA Compute 2.x (Fermi) is no longer compiled into the library.
Installer for OS X
- The libraries shipping in the OS X Installer are now compiled with Apple
  Clang v7.3.1 (previously v6.1.0).
- The OS X version used is 10.11.6 (previously 10.10.5).
Installer for Jetson TX1 / Tegra X1
- Requires JetPack for L4T 2.3
  (containing Linux for Tegra r24.2 for TX1).
- CUDA backend now uses CUDA 8.0 64-bit.
- Using CUDA's cusolver instead of CPU fallback.
- Uses OpenBLAS for CPU BLAS.
- All ArrayFire libraries are now 64-bit.

Improvements

Add [sparse array](ref sparse_func) support to \ref af::eval().
¹
Add OpenCL-CPU fallback support for sparse \ref af::matmul() when running on
a unified memory device. Uses MKL Sparse BLAS.
When using CUDA libdevice, pick the correct compute version based on device.
¹
OpenCL FFT now also supports prime factors 7, 11 and 13.
¹
²

Bug Fixes

Allow CUDA libdevice to be detected from custom directory.
Fix aarch64 detection on Jetson TX1 64-bit OS.
¹
Add missing definition of af_set_fft_plan_cache_size in unified backend.
¹
Fix intial values for \ref af::min() and \ref af::max() operations.
¹
²
Fix distance calculation in \ref af::nearestNeighbour for CUDA and OpenCL backend.
¹
²
Fix OpenCL bug where scalars where are passed incorrectly to compile options.
¹
Fix bug in \ref af::Window::surface() with respect to dimensions and ranges.
¹
Fix possible double free corruption in \ref af_assign_seq().
¹
Add missing eval for key in \ref af::scanByKey in CPU backend.
¹
Fixed creation of sparse values array using \ref AF_STORAGE_COO.
¹
¹

Examples

Add a [Conjugate Gradient solver example](ref benchmarks/cg.cpp)
to demonstrate sparse and dense matrix operations.
¹

CUDA Backend

When using CUDA 8.0,
compute 2.x are no longer in default compute list.
- This follows CUDA 8.0
  deprecating computes 2.x.
- Default computes for CUDA 8.0 will be 30, 50, 60.
When using CUDA pre-8.0, the default selection remains 20, 30, 50.
CUDA backend now uses -arch=sm_30 for PTX compilation as default.
- Unless compute 2.0 is enabled.

Known Issues

\ref af::lu() on CPU is known to give incorrect results when built run on
OS X 10.11 or 10.12 and compiled with Accelerate Framework.
¹
- Since the OS X Installer libraries uses MKL rather than Accelerate
  Framework, this issue does not affect those libraries.

[skip arrayfire ci]

…aarch64__

- Fixes issues with erode and dilate at corner cases

Float values that are inf are outputted as inf to stringstream. However inf isn't available in opencl, causing min and max to fail. This alias is the easiest work around for now.

The function definition and all calls to the getInfo functions are correct.

Compare the performance and memory usage of sparse vs dense using conjugate gradient example

BUGFIX: Add missing unified call for set_fft_plan_cache_size

Bugfixes to image morph and nearest neighbors

Sparse eval

Add Conjugate Gradient example to benchmarks

Fixes for TX1 64-bit OS + CUDA CMake Fix

Use function version of modDims function to properly update metadata before JIT evaluation is carried out.

fix for surface rendering function

Minor Bug Fixes

Preparing for release with CUDA 8

Properly check for libdevice header files and provide fallbacks

Increment version to 3.4.1 and Release Notes for v3.4.1

* Available when using MKL and not * Uses CPU fallback code when not using MKL

FEAT: Add CPU offload to Sparse matmul

Fix the creation of values array in sparse COO

Add support for 7, 11, 13 as factors for OpenCL FFT

shehzan10 and others added 30 commits September 13, 2016 16:32

Force update CUDA_LIBDEVICE_DIR when CUDA directories are updated

75d7133

BUGFIX: Add missing unified call for set_fft_plan_cache_size

76ab83f

BUGFIX: AARCH64 (TX1 64-bit OS) does not define __arm__ - Requires __…

11b7bd3

…aarch64__

DOCS: Move INSTALL.md to install.md

333a31a

DOCS: Remove unused packages from installation documentation

00a8a03

BUGFIX: Change the initial values for min and max operations

b448612

- Fixes issues with erode and dilate at corner cases

BUGFIX: Fixing bug in nearest neighbour for CUDA backend

dfbfca5

BUGFIX: fixing bug in nearest neighbour for opencl backend

f94aceb

TEST: Adding additional test for nearest neighbour

7372ef2

OPENCL: add inf as an alias to INFINITY

e656f15

Float values that are inf are outputted as inf to stringstream. However inf isn't available in opencl, causing min and max to fail. This alias is the easiest work around for now.

BUGFIX: Fixing regions after bug caused by the change in maxval

61bc24c

OPENCL BUGFIX: properly pass scalars as compile time options

663410e

Fix variable names in getInfo declaration

d7f7eb0

The function definition and all calls to the getInfo functions are correct.

Add support for sparse arrays to eval

7e1bb1a

Add Conjugate Gradient example to benchmarks - Includes sparse matrix

6a4f57b

Compare the performance and memory usage of sparse vs dense using conjugate gradient example

Merge pull request #1591 from 9prady9/fft_set_plan_fix

9ddf116

BUGFIX: Add missing unified call for set_fft_plan_cache_size

Merge pull request #1595 from pavanky/minmaxfix

ef70f35

Bugfixes to image morph and nearest neighbors

Merge pull request #1598 from shehzan10/sparse-eval

e5f5471

Sparse eval

Merge pull request #1599 from shehzan10/cg_example

72ff8dd

Add Conjugate Gradient example to benchmarks

Merge pull request #1600 from shehzan10/hotfix-3.4.1

b2831c6

Fixes for TX1 64-bit OS + CUDA CMake Fix

BUGFIX: Window::surface rendering function

6526c5e

Use function version of modDims function to properly update metadata before JIT evaluation is carried out.

More fixes related to modDims function

c17537f

Fix double free corruption when release arrays in af_assign_seq

1b18226

Fix syncthreads in cuda nearest neighbour

0ed6ccc

Remove rogue printf from bilateral kernel

ab61f45

Add key.eval to CPU scan by key

38cba47

use range function in Window::surface method

6100583

Merge pull request #1604 from 9prady9/surface_fixes

147a83b

fix for surface rendering function

Merge pull request #1605 from shehzan10/hotfix-3.4.1

b7bff62

Minor Bug Fixes

BUILD: Default CUDA Computes for CUDA 8 are 30, 50, 60

67f9522

shehzan10 and others added 19 commits October 7, 2016 12:57

BUILD: CUDA Use -arch=sm_30 when using CUDA 8 and not COMPUTE_20

baac61f

Fixed picking the right libdevice for device compute

264d1f5

Fix else case in compute to libdevice table

2c69075

Merge pull request #1612 from shehzan10/cuda8-fixes

c8a64f8

Preparing for release with CUDA 8

FIX CUDA 6.5 or older does not have libdevice compute_50

6494448

Properly check for libdevice header files and provide fallbacks

8ec78f1

Increment version to 3.4.1

4497241

Enable OpenCL FFT prime factors 7, 11, 13

2889109

Add tests for fft with multiples of 7, 11, 13

9e24c1c

BUGFIX Fix the dimensions of values array in sparse COO

f942987

Remove Tegra K1 build status from README.md

d0cb8d5

Updated release notes for v3.4.1

9476a9e

Add tests to verify dimensions and values of sparse arrays

fb8305a

Merge pull request #1613 from shehzan10/hotfix-3.4.1

04f3c78

Properly check for libdevice header files and provide fallbacks

Merge pull request #1615 from shehzan10/version

2c157ea

Increment version to 3.4.1 and Release Notes for v3.4.1

FEAT: Add CPU offload to Sparse matmul

6bedacc

* Available when using MKL and not * Uses CPU fallback code when not using MKL

Merge pull request #1614 from shehzan10/sparse-blas-cpu-offload

323a9ed

FEAT: Add CPU offload to Sparse matmul

Merge pull request #1621 from shehzan10/sparse-coo

4684dee

Fix the creation of values array in sparse COO

Merge pull request #1619 from shehzan10/opencl-fft-7-11-13

63df7ca

Add support for 7, 11, 13 as factors for OpenCL FFT

shehzan10 added the release label Oct 14, 2016

shehzan10 added this to the v3.4.1 milestone Oct 14, 2016

shehzan10 self-assigned this Oct 14, 2016

shehzan10 added the ready to merge label Oct 14, 2016

umar456 merged commit b9055b1 into master Oct 14, 2016

shehzan10 deleted the hotfix-3.4.1 branch October 15, 2016 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v3.4.1 #1623

Release v3.4.1 #1623

shehzan10 commented Oct 14, 2016 •

edited

Loading

Release v3.4.1 #1623

Release v3.4.1 #1623

Conversation

shehzan10 commented Oct 14, 2016 • edited Loading

v3.4.1

Installers

Improvements

Bug Fixes

Examples

CUDA Backend

Known Issues

shehzan10 commented Oct 14, 2016 •

edited

Loading