GFX803: could not create miopen handle: miopenStatusUnknownError #132

tsteinholz · 2020-03-07T16:57:09Z

Hello,

I am attempting to run Tensorflow using ROCm, however, I am consistently seeing this error when I attempt to run any test/demo/benchmark.

MIOpen Error: /root/driver/MLOpen/src/ocl/handleocl.cpp:332: Error: Creating Handle. Cannot Initialize Handle from Queue Invalid Command Queue
2020-03-07 11:30:50.337921: E tensorflow/stream_executor/rocm/rocm_dnn.cc:508] could not create miopen handle: miopenStatusUnknownError
2020-03-07 11:30:50.337932: W ./tensorflow/stream_executor/stream.h:2039] attempting to perform DNN operation using StreamExecutor without DNN support
2020-03-07 11:30:50.337954: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Not found: Failed to find conv algorithm!
	 [[{{node mnist/conv2d/Conv2D}}]]
2020-03-07 11:30:50.355892: I tensorflow/stream_executor/stream.cc:1990] [stream=0x7f71e78ed100,impl=0x7f71e78ed1a0] did not wait for [stream=0x7f71e7859260,impl=0x7f71e78ea110]
2020-03-07 11:30:50.355915: I tensorflow/stream_executor/stream.cc:4938] [stream=0x7f71e78ed100,impl=0x7f71e78ed1a0] did not memcpy host-to-device; source: 0x7f708c09d4c0
2020-03-07 11:30:50.355966: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed
Fatal Python error: Aborted

I am running a brand new installation of Ubuntu 18.04.4 LTS with ROCm v3.1 fully installed as specified in the getting-started documents. I have two Radeon RX 580 Series (POLARIS10, DRM 3.36.0, 5.3.0-40-generic, LLVM 9.0.0) and a Intel® Core™ i7-4770 CPU @ 3.40GHz × 8.

Here is the full output used to generate the error

tsteinholz@Yuri:~/Development/radeon/models/official/vision/image_classification$ python3 mnist_main.py   --model_dir=$MODEL_DIR   --data_dir=$DATA_DIR   --train_epochs=10   --distribution_strategy=one_device   --num_gpus=1   --download
I0307 11:30:39.541374 140127420430144 dataset_builder.py:202] Load pre-computed datasetinfo (eg: splits) from bucket.
I0307 11:30:39.717896 140127420430144 dataset_info.py:431] Loading info from GCS for mnist/3.0.0
I0307 11:30:39.806468 140127420430144 dataset_info.py:403] Field info.location from disk and from code do not match. Keeping the one from code.
I0307 11:30:39.807102 140127420430144 dataset_builder.py:310] Generating dataset mnist (/home/tsteinholz/tensorflow_datasets/mnist/3.0.0)
Downloading and preparing dataset mnist/3.0.0 (download: 11.06 MiB, generated: Unknown size, total: 11.06 MiB) to /home/tsteinholz/tensorflow_datasets/mnist/3.0.0...
W0307 11:30:39.927343 140127420430144 dataset_builder.py:334] Dataset mnist is hosted on GCS. It will automatically be downloaded to your
local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead set
data_dir=gs://tfds-data/datasets.

Dl Completed...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.46 file/s]
Dl Completed...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.65 file/s]
I0307 11:30:42.453349 140127420430144 dataset_info.py:403] Field info.location from disk and from code do not match. Keeping the one from code.
Dataset mnist downloaded and prepared to /home/tsteinholz/tensorflow_datasets/mnist/3.0.0. Subsequent calls will reuse this data.
I0307 11:30:42.453937 140127420430144 dataset_builder.py:458] Constructing tf.data.Dataset for split ['train', 'test'], from /home/tsteinholz/tensorflow_datasets/mnist/3.0.0
2020-03-07 11:30:42.458782: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
2020-03-07 11:30:42.516803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1573] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.355GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s
2020-03-07 11:30:42.516882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1573] Found device 1 with properties: 
pciBusID: 0000:02:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.2GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s
2020-03-07 11:30:42.553324: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-03-07 11:30:42.555041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-03-07 11:30:42.555937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-03-07 11:30:42.556098: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-03-07 11:30:42.556252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
2020-03-07 11:30:42.556502: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
2020-03-07 11:30:42.560552: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3405825000 Hz
2020-03-07 11:30:42.560767: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x58a6280 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-07 11:30:42.560778: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-03-07 11:30:42.561748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1573] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.355GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s
2020-03-07 11:30:42.561793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1573] Found device 1 with properties: 
pciBusID: 0000:02:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.2GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s
2020-03-07 11:30:42.561819: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-03-07 11:30:42.561830: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-03-07 11:30:42.561840: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-03-07 11:30:42.561850: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-03-07 11:30:42.561962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
2020-03-07 11:30:42.561973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-07 11:30:42.561978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 1 
2020-03-07 11:30:42.561983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N N 
2020-03-07 11:30:42.561987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1:   N N 
2020-03-07 11:30:42.562130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Ellesmere [Radeon RX 470/480/570/570X/580/580X], pci bus id: 0000:01:00.0)
2020-03-07 11:30:42.573514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7539 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480/570/570X/580/580X], pci bus id: 0000:02:00.0)
Train for 58 steps, validate for 9 steps
Epoch 1/10
2020-03-07 11:30:48.891256: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-03-07 11:30:50.337691: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
MIOpen Error: /root/driver/MLOpen/src/ocl/handleocl.cpp:332: Error: Creating Handle. Cannot Initialize Handle from Queue Invalid Command Queue
2020-03-07 11:30:50.337921: E tensorflow/stream_executor/rocm/rocm_dnn.cc:508] could not create miopen handle: miopenStatusUnknownError
2020-03-07 11:30:50.337932: W ./tensorflow/stream_executor/stream.h:2039] attempting to perform DNN operation using StreamExecutor without DNN support
2020-03-07 11:30:50.337954: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Not found: Failed to find conv algorithm!
	 [[{{node mnist/conv2d/Conv2D}}]]
2020-03-07 11:30:50.355892: I tensorflow/stream_executor/stream.cc:1990] [stream=0x7f71e78ed100,impl=0x7f71e78ed1a0] did not wait for [stream=0x7f71e7859260,impl=0x7f71e78ea110]
2020-03-07 11:30:50.355915: I tensorflow/stream_executor/stream.cc:4938] [stream=0x7f71e78ed100,impl=0x7f71e78ed1a0] did not memcpy host-to-device; source: 0x7f708c09d4c0
2020-03-07 11:30:50.355966: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed
Fatal Python error: Aborted

Thread 0x00007f71f51e3740 (most recent call first):
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_resource_variable_ops.py", line 44 in assign_add_variable_op
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 786 in assign_add
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/callbacks.py", line 1660 in _increment_step
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/callbacks.py", line 1691 in on_train_batch_end
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/callbacks.py", line 239 in _call_batch_hook
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 788 in on_batch
  File "/usr/lib/python3.6/contextlib.py", line 99 in __exit__
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 181 in run_one_epoch
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342 in fit
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 819 in fit
  File "mnist_main.py", line 135 in run
  File "mnist_main.py", line 164 in main
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/absl/app.py", line 250 in _run_main
  File "/home/tsteinholz/.local/lib/python3.6/site-packages/absl/app.py", line 299 in run
  File "mnist_main.py", line 171 in <module>

ROCm Info

tsteinholz@Yuri:~/Development/radeon/models/official/vision/image_classification$  /opt/rocm/bin/rocminfo
ROCk module is loaded
tsteinholz is member of video group
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  Marketing Name:          Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3900                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            8                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16334176(0xf93d60) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16334176(0xf93d60) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx803                             
  Marketing Name:          Ellesmere [Radeon RX 470/480/570/570X/580/580X]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26591(0x67df)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1355                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            36                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx803                             
  Marketing Name:          Ellesmere [Radeon RX 470/480/570/570X/580/580X]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26591(0x67df)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1200                               
  BDFID:                   512                                
  Internal Node ID:        2                                  
  Compute Unit:            36                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

The text was updated successfully, but these errors were encountered:

tsteinholz · 2020-03-17T19:43:16Z

UPDATE: I just got an update for all of my ROCm tools a couple of days ago, problem is still persisting.

Is this the correct place to report this issue?

Thank you,
Thomas Steinholz

daniellowell · 2020-04-06T17:06:58Z

Seems like from the logs there is an invalid command queue being created.
It looks like you have the miopen-opencl version of MIOpen installed, but you are using TensorFlow. Please uninstalled miopen-opencl and install miopen-hip as ROCm based TensorFlow will only work with miopen-hip.

aserio · 2020-04-22T14:06:06Z

Hi @tsteinholz, Did Daniel's comment help resolve the problem?

joseprupi · 2020-07-20T13:58:50Z

It dit work for me

tsteinholz · 2021-02-08T01:49:44Z

I have reinstalled a new OS since I opened this ticket, so I can't verify it anymore. I will close the ticket since @joseprupi was able to get this to work.

tsteinholz changed the title ~~GFX803: MIOPen Error:/root/driver/MLOpen/src/ocl/handleocl.cpp:332: Error: Creating Handle.~~ GFX803: could not create miopen handle: miopenStatusUnknownError Mar 7, 2020

aserio added help wanted pending_feedback labels Apr 22, 2020

tsteinholz closed this as completed Feb 8, 2021

xinlipn mentioned this issue Feb 2, 2023

[tests] Fix bug in weights tensor layout in solver test #1950

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFX803: could not create miopen handle: miopenStatusUnknownError #132

GFX803: could not create miopen handle: miopenStatusUnknownError #132

tsteinholz commented Mar 7, 2020

tsteinholz commented Mar 17, 2020

daniellowell commented Apr 6, 2020

aserio commented Apr 22, 2020

joseprupi commented Jul 20, 2020

tsteinholz commented Feb 8, 2021

GFX803: could not create miopen handle: miopenStatusUnknownError #132

GFX803: could not create miopen handle: miopenStatusUnknownError #132

Comments

tsteinholz commented Mar 7, 2020

tsteinholz commented Mar 17, 2020

daniellowell commented Apr 6, 2020

aserio commented Apr 22, 2020

joseprupi commented Jul 20, 2020

tsteinholz commented Feb 8, 2021