[FR] Mistral-7B Sliding Window Attention support #3371

wizzard0 · 2023-09-27T18:40:41Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

The Mistral model https://mistral.ai/news/announcing-mistral-7b/ is fully supported with llama.cpp.

🟢 Uses Grouped-query attention (GQA) for faster inference

❌ Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

"Outperforms Llama 2 13B on all benchmarks"

Only 4K of context is available to use

llama.cpp ac43576 (Sep 27)

The text was updated successfully, but these errors were encountered:

BarfingLemurs · 2023-09-27T20:08:25Z

choltha · 2023-09-27T20:17:17Z

I am surprised it works at all given it was trained on a partly different attention architecture.

ggerganov · 2023-09-28T19:09:20Z

Superseded by #3377

wizzard0 · 2023-09-29T13:43:12Z

@choltha yes, and they work

wizzard0 changed the title ~~[FR] Mistral-7B model support~~ [FR] Mistral-7B Sliding Window Attention support Sep 27, 2023

Green-Sky added enhancement New feature or request model Model specific labels Sep 27, 2023

ggerganov closed this as completed Sep 28, 2023