Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Custom RoPE Scaling #389

Merged
merged 3 commits into from
Jul 28, 2023
Merged

Custom RoPE Scaling #389

merged 3 commits into from
Jul 28, 2023

Conversation

LLukas22
Copy link
Contributor

Closes #378.

Adds custom context scaling to llama, falcon, gpt-j, gpt-neox.

Adds an Option<ggml::CustomRoPEArguments> parameter to the ModelParameters.

Adds the optional --rope-base and --rope-scaling cli parameters.

Copy link
Collaborator

@philpax philpax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. What's the easiest way to test it?

@LLukas22
Copy link
Contributor Author

  1. Sample command for 8k context of llama 2:
    cargo run --release --features cublas -- infer -a llama -m "C:\Users\lkreu\Downloads\llama-2-13b-chat.ggmlv3.q5_K_M.bin" -p "A llama riding a crab" --use-gpu --rope-scaling 0.5 --num-ctx-tokens 8192 --ignore-eos --stats

  2. Sit back and get some coffee☕ (8192 tokens are a lot of tokens to be generated)

16k context is also possible by setting rope-scaling to 0.25 but then i don't have enough VRAM to infer on my GPU.

@LLukas22
Copy link
Contributor Author

The generated text gets repetitive after some time but i guess that's a smapler/setting issue.
lama_story.txt

@philpax
Copy link
Collaborator

philpax commented Jul 28, 2023

Great work! I just tested it with LLongMa-2; it's a bit finicky, but that shouldn't be a problem from us. I've revised the names a little to match llama.cpp / refer to frequency, but the rest is the same. Will merge once CI passes 🚀

@philpax philpax merged commit 9fe9f19 into rustformers:main Jul 28, 2023
@hhamud hhamud mentioned this pull request Aug 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement SuperHOT/interpolated RoPE support
2 participants