Skip to content

Commit

Permalink
docs: Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
mudler committed May 16, 2023
1 parent 7f46f7c commit 290f9bd
Showing 1 changed file with 156 additions and 5 deletions.
161 changes: 156 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
**LocalAI** is a drop-in replacement REST API compatible with OpenAI API specifications for local inferencing. It allows to run models locally or on-prem with consumer grade hardware, supporting multiple models families compatible with the `ggml` format. For a list of the supported model families, see [the model compatibility table below](https://github.com/go-skynet/LocalAI#model-compatibility-table).

- OpenAI drop-in alternative REST API
- Supports multiple models
- Supports multiple models, Audio transcription, Text generation with GPTs, Image generation with stable diffusion (experimental)
- Once loaded the first time, it keep models loaded in memory for faster inference
- Support for prompt templates
- Doesn't shell-out, but uses C++ bindings for a faster inference and better performance.
Expand All @@ -33,6 +33,7 @@ See [examples on how to integrate LocalAI](https://github.com/go-skynet/LocalAI/

## News

- 16-05-2023: 🔥🔥🔥 Experimental support for CUDA (https://github.com/go-skynet/LocalAI/pull/258) in the `llama.cpp` backend and Stable diffusion CPU image generation (https://github.com/go-skynet/LocalAI/pull/272) in `master`.
- 14-05-2023: __v1.11.1__ released! `rwkv` backend patch release
- 13-05-2023: __v1.11.0__ released! 🔥 Updated `llama.cpp` bindings: This update includes a breaking change in the model files ( https://github.com/ggerganov/llama.cpp/pull/1405 ) - old models should still work with the `gpt4all-llama` backend.
- 12-05-2023: __v1.10.0__ released! 🔥🔥 Updated `gpt4all` bindings. Added support for GPTNeox (experimental), RedPajama (experimental), Starcoder (experimental), Replit (experimental), MosaicML MPT. Also now `embeddings` endpoint supports tokens arrays. See the [langchain-chroma](https://github.com/go-skynet/LocalAI/tree/master/examples/langchain-chroma) example! Note - this update does NOT include https://github.com/ggerganov/llama.cpp/pull/1405 which makes models incompatible.
Expand Down Expand Up @@ -106,7 +107,7 @@ Depending on the model you are attempting to run might need more RAM or CPU reso

<details>

| Backend | Compatible models | Completion/Chat endpoint | Audio transcription | Embeddings support | Token stream support | Github | Bindings |
| Backend | Compatible models | Completion/Chat endpoint | Audio transcription/Image | Embeddings support | Token stream support | Github | Bindings |
|-----------------|-----------------------|--------------------------|---------------------|-----------------------------------|----------------------|--------------------------------------------|-------------------------------------------|
| llama | Vicuna, Alpaca, LLaMa | yes | no | yes (doesn't seem to be accurate) | yes | https://github.com/ggerganov/llama.cpp | https://github.com/go-skynet/go-llama.cpp |
| gpt4all-llama | Vicuna, Alpaca, LLaMa | yes | no | no | yes | https://github.com/nomic-ai/gpt4all | https://github.com/go-skynet/gpt4all |
Expand All @@ -122,8 +123,8 @@ Depending on the model you are attempting to run might need more RAM or CPU reso
| bloomz | Bloom | yes | no | no | no | https://github.com/NouamaneTazi/bloomz.cpp | https://github.com/go-skynet/bloomz.cpp |
| rwkv | RWKV | yes | no | no | yes | https://github.com/saharNooby/rwkv.cpp | https://github.com/donomii/go-rwkv.cpp |
| bert-embeddings | bert | no | no | yes | no | https://github.com/skeskinen/bert.cpp | https://github.com/go-skynet/go-bert.cpp |
| whisper | whisper | no | yes | no | no | https://github.com/ggerganov/whisper.cpp | https://github.com/ggerganov/whisper.cpp |

| whisper | whisper | no | Audio | no | no | https://github.com/ggerganov/whisper.cpp | https://github.com/ggerganov/whisper.cpp |
| stablediffusion | stablediffusion | no | Image | no | no | https://github.com/EdVince/Stable-Diffusion-NCNN | https://github.com/mudler/go-stable-diffusion |
</details>

## Usage
Expand Down Expand Up @@ -294,6 +295,73 @@ Specifying a `config-file` via CLI allows to declare models in a single file as

See also [chatbot-ui](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui) as an example on how to use config files.

### Full config model file reference

```yaml
name: gpt-3.5-turbo
# Default model parameters
parameters:
# Relative to the models path
model: ggml-gpt4all-j
# temperature
temperature: 0.3
# all the OpenAI request options here..
top_k:
top_p:
max_tokens:
batch:
f16: true
ignore_eos: true
n_keep: 10
seed:
mode:
step:
# Default context size
context_size: 512
# Default number of threads
threads: 10
# Define a backend (optional). By default it will try to guess the backend the first time the model is interacted with.
backend: gptj # available: llama, stablelm, gpt2, gptj rwkv
# stopwords (if supported by the backend)
stopwords:
- "HUMAN:"
- "### Response:"
# string to trim space to
trimspace:
- string
# Strings to cut from the response
cutstrings:
- "string"
# define chat roles
roles:
user: "HUMAN:"
system: "GPT:"
assistant: "ASSISTANT:"
template:
# template file ".tmpl" with the prompt template to use by default on the endpoint call. Note there is no extension in the files
completion: completion
chat: ggml-gpt4all-j
edit: edit_template
# Enable F16 if backend supports it
f16: true
# Enable debugging
debug: true
# Enable embeddings
embeddings: true
# Mirostat configuration (llama.cpp only)
mirostat_eta: 0.8
mirostat_tau: 0.9
mirostat: 1
# GPU Layers (only used when built with cublas)
gpu_layers: 22
# Directory used to store additional assets (used for stablediffusion)
asset_dir: ""
```
</details>

### Prompt templates
Expand Down Expand Up @@ -351,6 +419,8 @@ local-ai --models-path <model_path> [--address <address>] [--threads <num_thread
| context-size | CONTEXT_SIZE | 512 | Default token context size. |
| debug | DEBUG | false | Enable debug mode. |
| config-file | CONFIG_FILE | empty | Path to a LocalAI config file. |
| upload_limit | UPLOAD_LIMIT | 5MB | Upload limit for whisper. |
| image-dir | CONFIG_FILE | empty | Image directory to store and serve processed images. |

</details>

Expand Down Expand Up @@ -443,6 +513,48 @@ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/jso

</details>

### Image generation

<details>

Image generation is experimental.

Requirements: OpenCV, Gomp

```
make GO_TAGS=stablediffusion rebuild
```

</details>

### Accelleration

#### OpenBLAS

<details>

Requirements: OpenBLAS

```
make BUILD_TYPE=openblas build
```

</details>

#### CuBLAS

<details>

Requirement: Nvidia CUDA toolkit

Note: CuBLAS support is experimental, and has not been tested on real HW. please report any issues you find!

```
make BUILD_TYPE=cublas build
```

</details>

### Windows compatibility

It should work, however you need to make sure you give enough resources to the container. See https://github.com/go-skynet/LocalAI/issues/2
Expand Down Expand Up @@ -615,6 +727,40 @@ curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/f

</details>

### Image generation

LocalAI supports generating images with Stable diffusion, running on CPU.

<details>

To generate an image you can send a POST request to the `/v1/images/generations` endpoint with the instruction as the request body:

```bash
# 512x512 is supported too
curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
"prompt": "A cute baby sea otter",
"size": "256x256"
}'
```

Available additional parameters: `mode`, `step`.

#### Setup

Note: In order to use the `images/generation` endpoint, you need to build LocalAI with `GO_TAGS=stablediffusion`.

1. Create a model file `stablediffusion.yaml` in the models folder:

```yaml
name: stablediffusion
backend: stablediffusion
asset_dir: stablediffusion_assets
```
2. Create a `stablediffusion_assets` directory inside your `models` directory
3. Download the ncnn assets from https://github.com/EdVince/Stable-Diffusion-NCNN#out-of-box and place them in `stablediffusion_assets`.

</details>

## Frequently asked questions

Here are answers to some of the most common questions.
Expand Down Expand Up @@ -716,10 +862,15 @@ MIT

## Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

- [llama.cpp](https://github.com/ggerganov/llama.cpp)
- https://github.com/tatsu-lab/stanford_alpaca
- https://github.com/cornelk/llama-go for the initial ideas
- https://github.com/antimatter15/alpaca.cpp for the light model version (this is compatible and tested only with that checkpoint model!)
- https://github.com/antimatter15/alpaca.cpp
- https://github.com/EdVince/Stable-Diffusion-NCNN
- https://github.com/ggerganov/whisper.cpp
- https://github.com/saharNooby/rwkv.cpp

## Contributors

Expand Down

0 comments on commit 290f9bd

Please sign in to comment.