examples : add configuration presets #10932

ggerganov · 2024-12-21T09:10:47Z

Description

I was recently looking for ways to demonstrate some of the functionality of the llama.cpp examples and some of the commands can become very cumbersome. For example, here is what I use for the llama.vim FIM server:

llama-server \
    -m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \
    --log-file ./service-vim.log \
    --host 0.0.0.0 --port 8012 \
    --ctx-size 0 \
    --cache-reuse 256 \
    -ub 1024 -b 1024 -ngl 99 -fa -dt 0.1

It would be much cleaner if I could just run, for example:

llama-server --cfg-fim-7b

Or if I could turn this embedding server command into something simpler:

# llama-server \
#     --hf-repo ggml-org/bert-base-uncased \
#     --hf-file          bert-base-uncased-Q8_0.gguf \
#     --port 8033 -c 512 --embeddings --pooling mean

llama-server --cfg-embd-bert --port 8033

Implementation

There is already an initial example of how we can create such configuration presets:

llama-tts --tts-oute-default -p "This is a TTS preset"

# equivalent to
# 
# llama-tts \
#    --hf-repo   OuteAI/OuteTTS-0.2-500M-GGUF \
#    --hf-file          OuteTTS-0.2-500M-Q8_0.gguf \
#    --hf-repo-v ggml-org/WavTokenizer \
#    --hf-file-v          WavTokenizer-Large-75-F16.gguf -p "This is a TTS preset"

llama.cpp/common/arg.cpp

Lines 2208 to 2220 in 5cd85b5

    
           // model-specific 
        
           add_opt(common_arg( 
        
               {"--tts-oute-default"}, 
        
               string_format("use default OuteTTS models (note: can download weights from the internet)"), 
        
               [](common_params & params) { 
        
                   params.hf_repo = "OuteAI/OuteTTS-0.2-500M-GGUF"; 
        
                   params.hf_file = "OuteTTS-0.2-500M-Q8_0.gguf"; 
        
                   params.vocoder.hf_repo = "ggml-org/WavTokenizer"; 
        
                   params.vocoder.hf_file = "WavTokenizer-Large-75-F16.gguf"; 
        
               } 
        
           ).set_examples({LLAMA_EXAMPLE_TTS}));

This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.

Goal

The goal of this issue is to create such presets for various common tasks:

Run a basic TTS generation (see above)
Start a chat server with a commonly used model
Start a speculative-decoding-enabled chat server with a commonly used model
Start a FIM server for plugins such as llama.vim
Start an embedding server with a commonly used embedding model
Start a reranking server with a commonly used reranking model
And many more ..

The list of configuration presets would require curation and proper documentation.

I think this is a great task for new contributors to help and to get involved in the project.

The text was updated successfully, but these errors were encountered:

sramichetty20019 · 2024-12-22T00:46:34Z

Hi! I'm interested in contributing to this issue as a first-time contributor I'd like to work on implementing the chat server preset for commonly used models.

ngxson · 2024-12-22T17:37:49Z

IMO having a flag --preset can be much more intuitive for most users. For example:

llama-server --preset qwen-fim-7b
llama-server --preset embd-bert
...

Or we can even introduce positional parameters (I steal the idea from ollama):

llama-server launch qwen-fim-7b

This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: ggerganov#10932

* common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: #10932

ggerganov added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers examples labels Dec 21, 2024

ggerganov pinned this issue Dec 21, 2024

This comment was marked as off-topic.

Sign in to view

danbev mentioned this issue Feb 5, 2025

common : add default embeddings presets #11677

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples : add configuration presets #10932

examples : add configuration presets #10932

ggerganov commented Dec 21, 2024 •

edited by danbev

Loading

sramichetty20019 commented Dec 22, 2024 •

edited

Loading

ngxson commented Dec 22, 2024

This comment was marked as off-topic.

examples : add configuration presets #10932

examples : add configuration presets #10932

Comments

ggerganov commented Dec 21, 2024 • edited by danbev Loading

Description

Implementation

Goal

sramichetty20019 commented Dec 22, 2024 • edited Loading

ngxson commented Dec 22, 2024

This comment was marked as off-topic.

ggerganov commented Dec 21, 2024 •

edited by danbev

Loading

sramichetty20019 commented Dec 22, 2024 •

edited

Loading