Skip to content

theHamsta/petalisp-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

petalisp-cuda

A CUDA backend for Petalisp. The project is in an early experimental phase.

Usage

(quicklisp:quickload :petalisp-cuda)

;; use petalisp-cuda as petalisp:*backend*
(petalisp-cuda:use-cuda-backend)

;; when you want to the petalisp-cuda for a specific code section
;; with potential re-use of already allocated CUDA resources
(petalisp-cuda:with-cuda-backend
    ...)

;; with-cuda-backend-raii will free all CUDA resources
;; and destroy the backend after the calculations
(petalisp-cuda:with-cuda-backend-raii
    ...)

Requirements

  • CUDA toolkit
  • CUDNN (optional)

Installation

With quicklisp installed, clone this repository to your local projects folder:

git clone [email protected]:theHamsta/petalisp-cuda.git

It is recommended to use my fork of cl-cuda that performs disk caching to ensure that the same kernel is not compiled multiple times.

git clone [email protected]:theHamsta/cl-cuda.git

TODOs

  • [x] sort indices for fasted dimensions
  • [ ] hash CUDA array strides to recompile correctly with arrays not allocated in C-layout with alternative memory layouts
  • [x] fix super-slow lisp->native, native->lisp calls (for Lisp arrays of element-type single-float/double-float)
  • [x] compile kernels not only for fixed iteration spaces
  • [ ] implement reductions with CUDNN
  • [ ] implement convolutions with CUDNN
  • [ ] __restrict__ kernel parameters
  • [ ] infer function parameters for generated __device__ functions
  • [x] Load scalars over __constant__ memory when loading from GPU RAM instead of Host RAM (transfered as kernel arguments)
  • [ ] fast interpolation using textures (?)
  • [ ] half float, bfloat support
  • [ ] optimized transposed memory accesses via cub.h
  • [ ] optimized memory accesses via stride tricks
  • [ ] static scheduler
  • [ ] use cudartc library for faster JIT compilation? (current solution has advantage of disk caching but might require longer for first compilation)

Debugging

You can activate logging in CUDNN by setting

export CUDNN_LOGDEST_DBG=stdout

or stderr or filename.txt

cl-cuda will print a lot of stuff to stdout. This behavior can be controlled with cl-cuda:*show-messages*.

Author

Copyright

Copyright (c) 2020 Stephan Seitz ([email protected])

License

Licensed under the GPLv3 License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published