Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

image nivdia/caffe not working with gpu #208

Closed
zlz0414 opened this issue Sep 27, 2016 · 13 comments
Closed

image nivdia/caffe not working with gpu #208

zlz0414 opened this issue Sep 27, 2016 · 13 comments

Comments

@zlz0414
Copy link

zlz0414 commented Sep 27, 2016

I tried to use the image nivdia/caffe but it didn't work.

Here is what I did and the result.

sudo nvidia-docker run --rm -t -i nvidia/caffe bash

root@9f3478fe9a6c:/# python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import caffe
libdc1394 error: Failed to initialize libdc1394
caffe.set_device(0)
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-32-generic/modules.dep.bin'
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0927 20:58:25.936079 16 common.cpp:163] Check failed: error == cudaSuccess (30 vs. 0) unknown error

I also tried to build a image from caffe/docker/standalone/gpu/Dockerfile. And I got the same result.

import caffe
libdc1394 error: Failed to initialize libdc1394
/usr/local/lib/python2.7/dist-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
caffe.set_device(0)
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-32-generic/modules.dep.bin'
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0927 20:54:24.738459 813 common.cpp:151] Check failed: error == cudaSuccess (30 vs. 0) unknown error

@3XX0
Copy link
Member

3XX0 commented Sep 27, 2016

Can you paste the output of:

lsmod | grep nvidia
ls /dev/nvidia*

@zlz0414
Copy link
Author

zlz0414 commented Sep 27, 2016

lsmod | grep nvidia
nvidia 10219305 30
drm 303102 4 ttm,drm_kms_helper,nvidia,nouveau

ls /dev/nvidia*
/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm /dev/nvidia-uvm-tools

@3XX0
Copy link
Member

3XX0 commented Sep 27, 2016

UVM is not loaded on your system but somehow you have the device files for it, this is the problem. sudo nvidia-modprobe -u -c=0 will fix it.

@zlz0414
Copy link
Author

zlz0414 commented Sep 27, 2016

It's now working. Thank you for your help!

@zlz0414
Copy link
Author

zlz0414 commented Sep 28, 2016

When I try to use caffe to train a net, I meet a error below:

F0928 01:20:23.271881 60 math_functions.cu:396] Check failed: status == CURAND_STATUS_SUCCESS (201 vs. 0) CURAND_STATUS_LAUNCH_FAILURE

What causes this?

@flx42
Copy link
Member

flx42 commented Sep 28, 2016

@zlz0414 what's your GPU?

@zlz0414
Copy link
Author

zlz0414 commented Sep 28, 2016

@flx42 P100

@flx42
Copy link
Member

flx42 commented Sep 28, 2016

@zlz0414 ok, that's why. Our current caffe/DIGITS images are based on CUDA 7.5 and this version of CUDA doesn't support the Pascal architecture.
I hope we will be able to release a new caffe based on CUDA 8.0 soon.

In the mean time, you need to build a custom image of Caffe, sorry about that. You can take a look at the official Dockerfile for caffe: https://github.com/BVLC/caffe/blob/master/docker/standalone/gpu/Dockerfile
Only minor changes should be needed: change the FROM, git clone NVIDIA/caffe instead of BVLC/caffe, maybe add a few extra packages.

@zlz0414
Copy link
Author

zlz0414 commented Sep 28, 2016

@flx42 Actually I have tried to build a custom image of Caffe with CUDA 8.0 but still with BVLC/caffe.
And I got
Check failed: error == cudaSuccess (8 vs. 0) invalid device function

I will try with NVIDIA/caffe right now. Hopefully it will work.

@flx42
Copy link
Member

flx42 commented Sep 28, 2016

You probably aren't compiling for Pascal, modify the cmake line and add something like this:

-DCUDA_ARCH_BIN="52 60" -DCUDA_ARCH_PTX="60"

@zlz0414
Copy link
Author

zlz0414 commented Sep 28, 2016

It's still not working.
I got

CMake Warning:
Manually-specified variables were not used by the project:
CUDA_ARCH_BIN

@zlz0414
Copy link
Author

zlz0414 commented Sep 28, 2016

@flx42 I check the Cuda.cmake file and find out that CUDA_ARCH_NAME is needed to be set "Manual" to make CUDA_ARCH_BIN and CUDA_ARCH_PTX able to be set manually.
I add this:
-DCUDA_ARCH_NAME="Manual" -DCUDA_ARCH_BIN="52 60" -DCUDA_ARCH_PTX="60"

Finally it's working now. Thank you very much!

@flx42
Copy link
Member

flx42 commented Sep 28, 2016

Ah yes, sorry, I missed this part! :)
I'm glad it works now, closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants