We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please answer the following questions for yourself before submitting an issue.
The Mistral model https://mistral.ai/news/announcing-mistral-7b/ is fully supported with llama.cpp.
🟢 Uses Grouped-query attention (GQA) for faster inference
❌ Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost
"Outperforms Llama 2 13B on all benchmarks"
Only 4K of context is available to use
llama.cpp ac43576 (Sep 27)
The text was updated successfully, but these errors were encountered:
https://github.com/mistralai/mistral-src
Sorry, something went wrong.
I am surprised it works at all given it was trained on a partly different attention architecture.
Did you run these models? https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF
Superseded by #3377
@choltha yes, and they work
No branches or pull requests
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
The Mistral model https://mistral.ai/news/announcing-mistral-7b/ is fully supported with llama.cpp.
🟢 Uses Grouped-query attention (GQA) for faster inference
❌ Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost
Current Behavior
Only 4K of context is available to use
Environment and Context
llama.cpp ac43576 (Sep 27)
The text was updated successfully, but these errors were encountered: