[Bug]: [V1][SpecDec] RuntimeError: CUDA error: an illegal memory access was encountered #13673
Open
1 task done
Labels
bug
Something isn't working
Your current environment
I'm using the
vllm/vllm-openai:v0.7.3
docker image.🐛 Describe the bug
I'm trying to run
[ngram]
speculative decoding onvllm
v1
using the following parameters on a fine-tuned Llama-3.2-3B:The server starts up correctly, but after sending a few concurrent requests (~5 RPS), I receive
Full Logs
You can see the initial successful requests and the error happening after. The worker process terminates and no further requests can be processed.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: