Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Mistral-7B Sliding Window Attention support #3371

Closed
4 tasks done
wizzard0 opened this issue Sep 27, 2023 · 4 comments
Closed
4 tasks done

[FR] Mistral-7B Sliding Window Attention support #3371

wizzard0 opened this issue Sep 27, 2023 · 4 comments
Labels
enhancement New feature or request model Model specific

Comments

@wizzard0
Copy link
Contributor

wizzard0 commented Sep 27, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

The Mistral model https://mistral.ai/news/announcing-mistral-7b/ is fully supported with llama.cpp.

🟢 Uses Grouped-query attention (GQA) for faster inference

❌ Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

"Outperforms Llama 2 13B on all benchmarks"

Current Behavior

Only 4K of context is available to use

Environment and Context

llama.cpp ac43576 (Sep 27)

@wizzard0 wizzard0 changed the title [FR] Mistral-7B model support [FR] Mistral-7B Sliding Window Attention support Sep 27, 2023
@BarfingLemurs
Copy link
Contributor

https://github.com/mistralai/mistral-src

@choltha
Copy link

choltha commented Sep 27, 2023

I am surprised it works at all given it was trained on a partly different attention architecture.

Did you run these models?
https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF

@Green-Sky Green-Sky added enhancement New feature or request model Model specific labels Sep 27, 2023
@ggerganov
Copy link
Member

Superseded by #3377

@wizzard0
Copy link
Contributor Author

@choltha yes, and they work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model Model specific
Projects
None yet
Development

No branches or pull requests

5 participants