Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] illegal memory access encountered #1467

Closed
5 tasks done
wonderisland opened this issue Sep 19, 2024 · 3 comments
Closed
5 tasks done

[Bug] illegal memory access encountered #1467

wonderisland opened this issue Sep 19, 2024 · 3 comments

Comments

@wonderisland
Copy link

wonderisland commented Sep 19, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

bugs:
dlccdefifz69nk44-master-0:32478:34896 [2] NCCL INFO comm 0x55b978e00250 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0xc9c72df61c4f59e1 - Init COMPLETE
INFO: 127.0.0.1:50812 - "POST /generate HTTP/1.1" 200 OK
[15:57:49] The server is fired up and ready to roll!
^@^@^@[15:58:39 TP0] Prefill batch. #new-seq: 1, #new-token: 9, #cached-token: 1, cache hit rate: 5.88%, #running-req: 0, #queue-req: 0
[15:58:51 TP0] Decode batch. #running-req: 1, #token: 43, token usage: 0.00, gen throughput (token/s): 0.53, #queue-req: 0
^@[15:58:58 TP0] Decode batch. #running-req: 1, #token: 83, token usage: 0.00, gen throughput (token/s): 6.20, #queue-req: 0
[15:59:04 TP0] Decode batch. #running-req: 1, #token: 123, token usage: 0.00, gen throughput (token/s): 6.18, #queue-req: 0
[15:59:11 TP0] Decode batch. #running-req: 1, #token: 163, token usage: 0.00, gen throughput (token/s): 5.83, #queue-req: 0
^@[rank6]:[E919 15:59:14.635045572 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 6] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f3fe4177f86 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f3fe4126d10 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f4004055f08 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f3f7617f3e6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f3f76184600 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f3f7618b2ba in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f3f7618d6fc in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdc253 (0x7f4018529253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x94ac3 (0x7f40725b8ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f4072649bf4 in /usr/lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
[rank3]:[E919 15:59:14.635224142 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 3] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f3fe4177f86 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f3fe4126d10 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f4004055f08 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f3f7617f3e6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f3f76184600 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f3f7618b2ba in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f3f7618d6fc in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdc253 (0x7f4018529253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x94ac3 (0x7f40725b8ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f4072649bf4 in /usr/lib/x86_64-linux-gnu/libc.so.6)

[rank0]:[E919 15:59:14.635248625 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

and cuda graph is enabled , mla enable then :
[17:39:38 TP3] Load weight begin. avail mem=77.08 GB
[2024-09-19 17:39:43,201] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-19 17:39:43,207] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-19 17:39:43,274] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-19 17:39:43,287] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-19 17:39:43,291] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-19 17:39:43,314] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-19 17:39:43,317] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-19 17:39:43,317] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
Loading safetensors checkpoint shards: 0% Completed | 0/55 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 2% Completed | 1/55 [00:06<06:05, 6.77s/it]
Loading safetensors checkpoint shards: 4% Completed | 2/55 [00:11<04:46, 5.40s/it]
Loading safetensors checkpoint shards: 5% Completed | 3/55 [00:15<04:18, 4.97s/it]
Loading safetensors checkpoint shards: 7% Completed | 4/55 [00:20<04:10, 4.91s/it]
Loading safetensors checkpoint shards: 9% Completed | 5/55 [00:25<04:00, 4.82s/it]
Loading safetensors checkpoint shards: 11% Completed | 6/55 [00:29<03:50, 4.71s/it]
Loading safetensors checkpoint shards: 13% Completed | 7/55 [00:33<03:38, 4.56s/it]
Loading safetensors checkpoint shards: 15% Completed | 8/55 [00:38<03:41, 4.71s/it]
Loading safetensors checkpoint shards: 16% Completed | 9/55 [00:43<03:36, 4.71s/it]
Loading safetensors checkpoint shards: 18% Completed | 10/55 [00:47<03:25, 4.57s/it]
Loading safetensors checkpoint shards: 20% Completed | 11/55 [00:52<03:16, 4.47s/it]
Loading safetensors checkpoint shards: 22% Completed | 12/55 [00:56<03:11, 4.46s/it]
Loading safetensors checkpoint shards: 24% Completed | 13/55 [01:01<03:10, 4.53s/it]
Loading safetensors checkpoint shards: 25% Completed | 14/55 [01:05<03:05, 4.52s/it]
Loading safetensors checkpoint shards: 27% Completed | 15/55 [01:10<02:58, 4.47s/it]
Loading safetensors checkpoint shards: 29% Completed | 16/55 [01:14<02:53, 4.44s/it]
Loading safetensors checkpoint shards: 31% Completed | 17/55 [01:19<02:55, 4.61s/it]
Loading safetensors checkpoint shards: 33% Completed | 18/55 [01:23<02:49, 4.57s/it]
Loading safetensors checkpoint shards: 35% Completed | 19/55 [01:28<02:44, 4.57s/it]
Loading safetensors checkpoint shards: 36% Completed | 20/55 [01:33<02:39, 4.56s/it]
Loading safetensors checkpoint shards: 38% Completed | 21/55 [01:37<02:35, 4.56s/it]
Loading safetensors checkpoint shards: 40% Completed | 22/55 [01:41<02:28, 4.50s/it]
Loading safetensors checkpoint shards: 42% Completed | 23/55 [01:46<02:24, 4.50s/it]
Loading safetensors checkpoint shards: 44% Completed | 24/55 [01:51<02:20, 4.52s/it]
Loading safetensors checkpoint shards: 45% Completed | 25/55 [01:55<02:13, 4.46s/it]
Loading safetensors checkpoint shards: 47% Completed | 26/55 [01:59<02:07, 4.41s/it]
Loading safetensors checkpoint shards: 49% Completed | 27/55 [02:03<01:57, 4.20s/it]
Loading safetensors checkpoint shards: 51% Completed | 28/55 [02:07<01:55, 4.29s/it]
Loading safetensors checkpoint shards: 53% Completed | 29/55 [02:11<01:47, 4.13s/it]
Loading safetensors checkpoint shards: 55% Completed | 30/55 [02:15<01:39, 3.98s/it]
Loading safetensors checkpoint shards: 56% Completed | 31/55 [02:18<01:33, 3.91s/it]
Loading safetensors checkpoint shards: 58% Completed | 32/55 [02:22<01:29, 3.87s/it]
Loading safetensors checkpoint shards: 60% Completed | 33/55 [02:26<01:22, 3.75s/it]
Loading safetensors checkpoint shards: 62% Completed | 34/55 [02:29<01:17, 3.69s/it]
Loading safetensors checkpoint shards: 64% Completed | 35/55 [02:33<01:13, 3.67s/it]
Loading safetensors checkpoint shards: 65% Completed | 36/55 [02:37<01:09, 3.65s/it]
Loading safetensors checkpoint shards: 67% Completed | 37/55 [02:40<01:05, 3.62s/it]
Loading safetensors checkpoint shards: 69% Completed | 38/55 [02:43<00:59, 3.50s/it]
Loading safetensors checkpoint shards: 71% Completed | 39/55 [02:47<00:56, 3.52s/it]
Loading safetensors checkpoint shards: 73% Completed | 40/55 [02:50<00:52, 3.50s/it]
Loading safetensors checkpoint shards: 75% Completed | 41/55 [02:54<00:49, 3.51s/it]
Loading safetensors checkpoint shards: 76% Completed | 42/55 [02:57<00:45, 3.52s/it]
Loading safetensors checkpoint shards: 78% Completed | 43/55 [03:01<00:42, 3.54s/it]
Loading safetensors checkpoint shards: 80% Completed | 44/55 [03:04<00:38, 3.50s/it]
Loading safetensors checkpoint shards: 82% Completed | 45/55 [03:08<00:34, 3.47s/it]
Loading safetensors checkpoint shards: 84% Completed | 46/55 [03:11<00:31, 3.46s/it]
Loading safetensors checkpoint shards: 85% Completed | 47/55 [03:15<00:27, 3.48s/it]
Loading safetensors checkpoint shards: 87% Completed | 48/55 [03:18<00:24, 3.52s/it]
Loading safetensors checkpoint shards: 89% Completed | 49/55 [03:22<00:21, 3.54s/it]
Loading safetensors checkpoint shards: 91% Completed | 50/55 [03:25<00:17, 3.53s/it]
Loading safetensors checkpoint shards: 93% Completed | 51/55 [03:29<00:14, 3.51s/it]
Loading safetensors checkpoint shards: 95% Completed | 52/55 [03:32<00:09, 3.32s/it]
Loading safetensors checkpoint shards: 96% Completed | 53/55 [03:35<00:06, 3.18s/it]
^@[17:43:23 TP6] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.62 GB
[17:43:23 TP7] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.91 GB
Loading safetensors checkpoint shards: 98% Completed | 54/55 [03:38<00:03, 3.20s/it]
[17:43:23 TP4] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.62 GB
Loading safetensors checkpoint shards: 100% Completed | 55/55 [03:41<00:00, 3.24s/it]
Loading safetensors checkpoint shards: 100% Completed | 55/55 [03:41<00:00, 4.03s/it]

[17:43:27 TP1] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.62 GB
[17:43:27 TP2] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.62 GB
[17:43:27 TP0] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.91 GB
[17:43:27 TP3] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.62 GB
[17:43:28 TP5] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.62 GB
[17:43:28 TP1] Memory pool end. avail mem=18.20 GB
[17:43:28 TP5] Memory pool end. avail mem=18.20 GB
[17:43:28 TP4] Memory pool end. avail mem=18.20 GB
[17:43:28 TP6] Memory pool end. avail mem=18.20 GB
[17:43:28 TP2] Memory pool end. avail mem=18.20 GB
[17:43:28 TP3] Memory pool end. avail mem=18.20 GB
[17:43:28 TP0] Memory pool end. avail mem=18.48 GB
[17:43:28 TP7] Memory pool end. avail mem=18.48 GB
[17:43:28 TP4] Capture cuda graph begin. This can take up to several minutes.
[17:43:28 TP1] Capture cuda graph begin. This can take up to several minutes.
[17:43:28 TP3] Capture cuda graph begin. This can take up to several minutes.
[17:43:28 TP5] Capture cuda graph begin. This can take up to several minutes.
[17:43:28 TP2] Capture cuda graph begin. This can take up to several minutes.
[17:43:28 TP6] Capture cuda graph begin. This can take up to several minutes.
[17:43:28 TP0] Capture cuda graph begin. This can take up to several minutes.
[17:43:28 TP7] Capture cuda graph begin. This can take up to several minutes.
[17:43:35 TP3] Exception in run_tp_server:
Traceback (most recent call last):
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 146, in init
self.capture()
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 175, in capture
) = self.capture_one_batch_size(bs, forward)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 216, in capture_one_batch_size
run_once()
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 210, in run_once
return forward(input_ids, input_metadata.positions, input_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 663, in forward
hidden_states = self.model(input_ids, positions, input_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 632, in forward
hidden_states, residual = layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 579, in forward
hidden_states = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 474, in forward
attn_output = self.attn(q_input, k_input, v_input, input_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/layers/radix_attention.py", line 58, in forward
return input_metadata.attn_backend.forward(q, k, v, self, input_metadata)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/layers/attention_backend.py", line 69, in forward
return self.forward_decode(q, k, v, layer, input_metadata)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/layers/attention_backend.py", line 466, in forward_decode
self.decode_attention_fwd(
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/layers/triton_attention/decode_attention.py", line 623, in decode_attention_fwd
_decode_grouped_softmax_reducev_fwd(
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/layers/triton_attention/decode_attention.py", line 545, in _decode_grouped_softmax_reducev_fwd
_fwd_grouped_kernel_stage2[grid](
File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 345, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 691, in run
kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/driver.py", line 365, in call
self.launch(*args, **kwargs)
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 959, in run_tp_server
model_server = ModelTpServer(
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 100, in init
self.model_runner = ModelRunner(
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 128, in init
self.init_cuda_graphs()
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 468, in init_cuda_graphs
self.cuda_graph_runner = CudaGraphRunner(self)
File "/mnt/data/hectorgao/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 148, in init
raise Exception(
Exception: Capture cuda graph failed: Triton Error [CUDA]: an illegal memory access was encountered
Possible solutions:

  1. disable cuda graph by --disable-cuda-graph
  2. set --mem-fraction-static to a smaller value
  3. disable torch compile by not using --enable-torch-compile
    Open an issue on GitHub https://github.com/sgl-project/sglang/issues/new/choose

Reproduction

http server:
export CUDA_LAUNCH_BLOCKING=1
python -m sglang.launch_server --model-path /mnt/data/models/DeepSeek-V2.5 --tp 8 --mem-fraction-static 0.8 --enable-mla --trust-remote-code --port 30000 --disable-cuda-graph
because cuda-graph need more gpu memory, so I choose disable, just inference bigger seq len response, do not care about the latency

http request:
curl http://localhost:30000/generate
-H "Content-Type: application/json"
-d '{
"text": "please repeat output word ok, do not stop",
"sampling_params": {
"max_new_tokens": 16000,
"temperature": 0.8,
"repetition_penalty": 0.1
}
}'

Environment

env:
sglang vesion: https://github.com/sgl-project/sglang/releases/tag/v0.3.1.post1
GPU: A800

root@dlccdefifz69nk44-master-0:~# nvidia-smi
Thu Sep 19 16:08:29 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A800-SXM4-80GB On | 00000000:00:01.0 Off | 0 |
| N/A 28C P0 70W / 400W | 73618MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A800-SXM4-80GB On | 00000000:00:02.0 Off | 0 |
| N/A 31C P0 71W / 400W | 74050MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A800-SXM4-80GB On | 00000000:00:03.0 Off | 0 |
| N/A 30C P0 72W / 400W | 74050MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A800-SXM4-80GB On | 00000000:00:04.0 Off | 0 |
| N/A 29C P0 71W / 400W | 74050MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A800-SXM4-80GB On | 00000000:00:05.0 Off | 0 |
| N/A 28C P0 70W / 400W | 74050MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A800-SXM4-80GB On | 00000000:00:06.0 Off | 0 |
| N/A 30C P0 70W / 400W | 74050MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A800-SXM4-80GB On | 00000000:00:07.0 Off | 0 |
| N/A 30C P0 70W / 400W | 74050MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A800-SXM4-80GB On | 00000000:00:08.0 Off | 0 |
| N/A 27C P0 69W / 400W | 73618MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
os: Linux dlccdefifz69nk44-master-0 4.19.91-014.15-kangaroo.alios7.x86_64 #1 SMP Wed Jul 10 15:22:10 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

@York-Cheung
Copy link

same, do you solve it?

@fengyang95
Copy link

@merrymercy
Copy link
Contributor

Thanks for reporting this. It has been fixed by a68cb20.
We also released a new version for this fix. Can you try v0.3.1.post3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants