-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU example fails. #235
Labels
Comments
Can you run it in gdb and give us the stack trace? |
Sorry, I only know how to run pure c/c++ project in gdb (and I know how to set CMAKE_BUILD_TYPE when compiling), but I never use gdb for debuging c/c++ functions called by python. |
@sth1997 sorry for late reply. You can run gdb on python like this:
|
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
My cuda version is 9.0 and my cudnn version is 3.7.5.
I cau successfully the Walkthrough.ipynb code with cpu on a single node or multiple nodes. But if I set device=DeviceType.GPU for db.ops.Histogram and ran it on a single node or multiple nodes , it failed. This is its output:
5%|██████████▊ | 1/19 [00:02<00:36, 2.01s/it, jobs=1, tasks=18, workers=1]
Segmentation fault
I checked the log, this is no WARNING logs, just one INFO log:
Log file created at: 2018/12/06 16:30:00
Running on machine: gorgon4
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1206 16:30:00.866015 107280 ingest.cpp:936] Writing database metadata
I1206 16:30:00.869004 107280 ingest.cpp:940] Writing table megafile
I1206 16:30:00.889670 107194 worker.cpp:480] Creating worker
I1206 16:30:00.889878 107194 worker.cpp:497] Create master stub
I1206 16:30:00.889976 107194 worker.cpp:500] Finish master stub
I1206 16:30:00.890017 107194 worker.cpp:507] Worker created.
I1206 16:30:00.890188 107194 worker.cpp:666] Worker try to register with master
I1206 16:30:00.891312 107194 worker.cpp:693] Worker registered with master with id 0
I1206 16:30:00.902165 107327 worker.cpp:548] Worker 0 received NewJob
I1206 16:30:00.902737 107326 worker.cpp:722] Worker 0 loading Op library: /home/sth/.local/lib/python3.6/site-packages/scannerpy/lib/libscanner_stdlib.so
I1206 16:30:00.905745 107326 worker.cpp:1254] Initial pipeline instances per node: -1
I1206 16:30:00.905762 107326 worker.cpp:1280] Kernel Group 0 Pipeline instances per node: 1
I1206 16:30:00.905768 107326 worker.cpp:1294] Pipeline instances per node: 1
After that, I have also tried use GPU in examples/apps/quickstart/main.py. I set device=DeviceType.GPU for db.ops.Resize, it also failed. This is its output:
0%| | 0/7 [00:02<?, ?it/s, jobs=1, tasks=7, workers=1]
Segmentation fault
I also checked the log, the INFO log is a little different from the previous one:
Log file created at: 2018/12/06 16:35:14
Running on machine: gorgon4
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1206 16:35:14.334425 107522 ingest.cpp:936] Writing database metadata
Have you ever met this problem?
Let me know if you need more information.
@apoms @willcrichton
The text was updated successfully, but these errors were encountered: