Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v3.4.1 #1623

Merged
merged 49 commits into from
Oct 14, 2016
Merged

Release v3.4.1 #1623

merged 49 commits into from
Oct 14, 2016

Conversation

shehzan10
Copy link
Member

@shehzan10 shehzan10 commented Oct 14, 2016

v3.4.1

Installers

  • Installers for Linux, OS X and Windows
    • CUDA backend now uses CUDA 8.0.
    • Uses Intel MKL 2017.
    • CUDA Compute 2.x (Fermi) is no longer compiled into the library.
  • Installer for OS X
    • The libraries shipping in the OS X Installer are now compiled with Apple
      Clang v7.3.1 (previously v6.1.0).
    • The OS X version used is 10.11.6 (previously 10.10.5).
  • Installer for Jetson TX1 / Tegra X1
    • Requires JetPack for L4T 2.3
      (containing Linux for Tegra r24.2 for TX1).
    • CUDA backend now uses CUDA 8.0 64-bit.
    • Using CUDA's cusolver instead of CPU fallback.
    • Uses OpenBLAS for CPU BLAS.
    • All ArrayFire libraries are now 64-bit.

Improvements

  • Add [sparse array](ref sparse_func) support to \ref af::eval().
    1
  • Add OpenCL-CPU fallback support for sparse \ref af::matmul() when running on
    a unified memory device. Uses MKL Sparse BLAS.
  • When using CUDA libdevice, pick the correct compute version based on device.
    1
  • OpenCL FFT now also supports prime factors 7, 11 and 13.
    1
    2

Bug Fixes

  • Allow CUDA libdevice to be detected from custom directory.
  • Fix aarch64 detection on Jetson TX1 64-bit OS.
    1
  • Add missing definition of af_set_fft_plan_cache_size in unified backend.
    1
  • Fix intial values for \ref af::min() and \ref af::max() operations.
    1
    2
  • Fix distance calculation in \ref af::nearestNeighbour for CUDA and OpenCL backend.
    1
    2
  • Fix OpenCL bug where scalars where are passed incorrectly to compile options.
    1
  • Fix bug in \ref af::Window::surface() with respect to dimensions and ranges.
    1
  • Fix possible double free corruption in \ref af_assign_seq().
    1
  • Add missing eval for key in \ref af::scanByKey in CPU backend.
    1
  • Fixed creation of sparse values array using \ref AF_STORAGE_COO.
    1
    1

Examples

  • Add a [Conjugate Gradient solver example](ref benchmarks/cg.cpp)
    to demonstrate sparse and dense matrix operations.
    1

CUDA Backend

  • When using CUDA 8.0,
    compute 2.x are no longer in default compute list.
    • This follows CUDA 8.0
      deprecating computes 2.x.
    • Default computes for CUDA 8.0 will be 30, 50, 60.
  • When using CUDA pre-8.0, the default selection remains 20, 30, 50.
  • CUDA backend now uses -arch=sm_30 for PTX compilation as default.
    • Unless compute 2.0 is enabled.

Known Issues

  • \ref af::lu() on CPU is known to give incorrect results when built run on
    OS X 10.11 or 10.12 and compiled with Accelerate Framework.
    1
    • Since the OS X Installer libraries uses MKL rather than Accelerate
      Framework, this issue does not affect those libraries.

[skip arrayfire ci]

shehzan10 and others added 30 commits September 13, 2016 16:32
- Fixes issues with erode and dilate at corner cases
Float values that are inf are outputted as inf to stringstream.
However inf isn't available in opencl, causing min and max to fail.

This alias is the easiest work around for now.
The function definition and all calls to the getInfo functions are
correct.
Compare the performance and memory usage of sparse vs dense using conjugate
gradient example
BUGFIX: Add missing unified call for set_fft_plan_cache_size
Bugfixes to image morph and nearest neighbors
Add Conjugate Gradient example to benchmarks
Fixes for TX1 64-bit OS + CUDA CMake Fix
Use function version of modDims function to properly
update metadata before JIT evaluation is carried out.
fix for surface rendering function
shehzan10 and others added 19 commits October 7, 2016 12:57
Preparing for release with CUDA 8
Properly check for libdevice header files and provide fallbacks
Increment version to 3.4.1 and Release Notes for v3.4.1
* Available when using MKL and not
* Uses CPU fallback code when not using MKL
FEAT: Add CPU offload to Sparse matmul
Fix the creation of values array in sparse COO
Add support for 7, 11, 13 as factors for OpenCL FFT
@shehzan10 shehzan10 added this to the v3.4.1 milestone Oct 14, 2016
@shehzan10 shehzan10 self-assigned this Oct 14, 2016
@umar456 umar456 merged commit b9055b1 into master Oct 14, 2016
@shehzan10 shehzan10 deleted the hotfix-3.4.1 branch October 15, 2016 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants