diff --git a/README.md b/README.md index bd29fbd1b8..961d115f74 100644 --- a/README.md +++ b/README.md @@ -320,32 +320,11 @@ To run local models, it is possible to use OpenAI compatible APIs, for instance
-To run local inference, you need to download the models first, for instance you can find `ggml` compatible models in [huggingface.com](https://huggingface.co/models?search=ggml). +To run local inference, you need to download the models first, for instance you can find `ggml` compatible models in [huggingface.com](https://huggingface.co/models?search=ggml) (for example vicuna, alpaca and koala). ### Start the API server -To start the API server, follow the instruction in [LocalAI](https://github.com/go-skynet/LocalAI#usage): - -``` -git clone https://github.com/go-skynet/LocalAI - -cd LocalAI - -# copy your models to models/ -cp your-model models/ - -# (optional) Edit the .env file to set the number of concurrent threads used for inference -# echo "THREADS=14" > .env - -# start with docker-compose -docker compose up -d --build - -# Check that the API is accessible at localhost:8080 -curl http://localhost:8080/v1/models -# {"object":"list","data":[{"id":"your-model","object":"model"}]} -``` - -In order to use a local model, you might probably need to set a prompt template. This depends on the model being used. Create a file next your model ending by `.tmpl`, see some of the [templates examples in LocalAI](https://github.com/go-skynet/LocalAI/tree/master/prompt-templates). +To start the API server, follow the instruction in [LocalAI](https://github.com/go-skynet/LocalAI#example-use-gpt4all-j-model). ### Run k8sgpt