Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepseek2 does not support K-shift Denial-of-Service vulnerability #10380

Closed
99991 opened this issue Nov 18, 2024 Discussed in #9092 · 4 comments · Fixed by #10401
Closed

Deepseek2 does not support K-shift Denial-of-Service vulnerability #10380

99991 opened this issue Nov 18, 2024 Discussed in #9092 · 4 comments · Fixed by #10401

Comments

@99991
Copy link

99991 commented Nov 18, 2024

Long prompts/responses crash llama-server because "Deepseek2 does not support K-shift". For long prompts/responses, llama-server should return an error message or truncate the response, but instead, GGML_ABORT is called, which crashes the server. I believe that this is a Denial-of-Service vulnerability. A client should never be able to trigger GGML_ABORT.

The relevant line in the code is here:

https://github.com/ggerganov/llama.cpp/blob/9b75f03cd2ec9cc482084049d87a0f08f9f01517/src/llama.cpp#L18032

I have reported this security vulnerability almost three months ago here (link only visible for maintainers), but have received no response and it is public knowledge now anyway, so I also opened this issue to increase visibility.

Discussed in #9092

Originally posted by 99991 August 19, 2024
It is my understanding that llama.cpp shifts the key-value cache when generating more tokens than fit into the context window, which is not supported for DeepSeek Coder V2. To reproduce, start a server with this model

./llama-server -m DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf -c 32 -ngl 999 --port 8080

and then request a prompt completion:

curl -H "Content-Type: application/json" --request POST --data '{"prompt": "Mergesort in Python:", "n_predict": 32}' http://127.0.0.1:8080/completion

This should trigger the error

src/llama.cpp:15646: Deepseek2 does not support K-shift
Aborted

with llama.cpp release b3600.

The corresponding code in llama.cpp is here:

https://github.com/ggerganov/llama.cpp/blob/cfac111e2b3953cdb6b0126e67a2487687646971/src/llama.cpp#L15643C31-L15648C1

I believe that a saner approach would simply stop generating tokens instead of crashing the server. Is there some option that can be set to prevent clients from crashing the server?

@99991
Copy link
Author

99991 commented Nov 18, 2024

@ngxson
Copy link
Collaborator

ngxson commented Nov 18, 2024

You can also disable K-shift by disabling context shifting, via this argument: --no-context-shift

@FireAngelx
Copy link

@ggerganov Hi! I also find this problem in ollama, while I query a long text to deepseekV2, it would call the K-shift error, how could I set the param in ollama? Otherwise, I think that the model serve should not be crashed anyway.

@SuperJunier666
Copy link

@ggerganov Hi! I also find this problem in ollama, while I query a long text to deepseekV2, it would call the K-shift error, how could I set the param in ollama? Otherwise, I think that the model serve should not be crashed anyway.你好!我在 ollama 中也发现了这个问题,当我向 deepseekV2 查询长文本时,它会调用K-shift错误,我该如何在ollama中设置参数呢?否则,我认为模型 serve 无论如何都不应该崩溃。

I had this problem too, have you solved it yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants