Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo: Adds TOC to README #57

Merged
merged 1 commit into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 26 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,40 @@
# Ollama Grid Search and A/B Testing Desktop App.
# Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.

A Rust based tool to evaluate LLM models, prompts and model params.

(Issues with Llama3? Please read [this](https://github.com/dezoito/ollama-grid-search/issues/8)).

## Purpose

This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results.

It assumes [Ollama](https://www.ollama.ai) is installed and serving endpoints, either in `localhost` or in a remote server.

## Quick Example

Here's a test for a simple prompt, tested on 2 models, using `0.7` and `1.0` as values for `temperature`:
Here's what an experiment for a simple prompt, tested on 3 different models, looks like:

[<img src="./screenshots/main.png?raw=true" alt="Main Screenshot" width="720">](./screenshots/main.png?raw=true)

(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).


## Table of Contents

- [Installation](#installation)
- [Features](#features)
- [Grid Search Concept](#grid-search-or-something-similar)
- [A/B Testing](#ab-testing)
- [Prompt Archive](#prompt-archive)
- [Experiment Logs](#experiment-logs)
- [Future Features](#future-features)
- [Contributing](#contributing)
- [Development](#development)
- [Citations](#citations)
- [Acknowledgements](#thank-you)


## Installation

Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases) for the project, or on the sidebar.

## Features

- Automatically fetches models from local or remote Ollama servers;
- Iterates over different models, prompts and parameters to generate inferences;
- Iterates over multiple different models, prompts and parameters to generate inferences;
- A/B test different prompts on several models simultaneously;
- Allows multiple iterations for each combination of parameters;
- Allows [limited concurrency](https://dezoito.github.io/2024/03/21/react-limited-concurrency.html) **or** synchronous inference calls (to prevent spamming servers);
Expand All @@ -36,9 +45,11 @@ Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases
- Experiments can be inspected in readable views;
- Re-run past experiments, cloning or modifying the parameters used in the past;
- Configurable inference timeout;
- Custom default parameters and system prompts can be defined in settings:
- Custom default parameters and system prompts can be defined in settings
- Fully functional prompt database with examples;
- Prompts can be selected and "autocompleted" by typing "/" in the inputs


[<img src="./screenshots/settings.png?raw=true" alt="Settings" width="720">](./screenshots/settings.png?raw=true)

## Grid Search (or something similar...)

Expand All @@ -52,7 +63,6 @@ Lets define a selection of models, a prompt and some parameter combinations:

The prompt will be submitted once for each parameter **value**, for each one of the selected models, generating a set of responses.


## A/B Testing

Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations:
Expand All @@ -62,6 +72,7 @@ Similarly, you can perform A/B tests by selecting different models and compare r
<small>Comparing the results of different prompts for the same model</small>

## Prompt Archive

You can save and manage your prompts (we want to make prompts compatible with [Open WebUI](https://github.com/open-webui/open-webui))

[<img src="./screenshots/prompt-archive.png?raw=true" alt="Settings" width="720">](./screenshots/prompt-archive.png?raw=true)
Expand All @@ -70,8 +81,6 @@ You can **autocomplete** prompts by typing "/" (inspired by Open WebUI, as well)

[<img src="./screenshots/autocomplete.gif?raw=true" alt="A/B testing" width="720">](./screenshots/autocomplete.gif?raw=true)



## Experiment Logs

You can list, inspect, or download your experiments:
Expand All @@ -81,7 +90,7 @@ You can list, inspect, or download your experiments:
## Future Features

- Grading results and filtering by grade
- Importing, exporting and sharing prompt lists and experiment parameters.
- Importing, exporting and sharing prompt lists and experiment files.

## Contributing

Expand Down Expand Up @@ -113,7 +122,7 @@ cd ollama-grid-search

If you are running VS Code, add this to your `settings.json` file

```
```json
{
...
"rust-analyzer.check.command": "clippy",
Expand Down
135 changes: 135 additions & 0 deletions README.md.old
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.


This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results.

It assumes [Ollama](https://www.ollama.ai) is installed and serving endpoints, either in `localhost` or in a remote server.

Here's what an experiment for a simple prompt, tested on 3 different models, looks like:

[<img src="./screenshots/main.png?raw=true" alt="Main Screenshot" width="720">](./screenshots/main.png?raw=true)

(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).

## Installation

Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases) for the project, or on the sidebar.

## Features

- Automatically fetches models from local or remote Ollama servers;
- Iterates over different models, prompts and parameters to generate inferences;
- A/B test different prompts on several models simultaneously;
- Allows multiple iterations for each combination of parameters;
- Allows [limited concurrency](https://dezoito.github.io/2024/03/21/react-limited-concurrency.html) **or** synchronous inference calls (to prevent spamming servers);
- Optionally outputs inference parameters and response metadata (inference time, tokens and tokens/s);
- Refetching of individual inference calls;
- Model selection can be filtered by name;
- List experiments which can be downloaded in JSON format;
- Experiments can be inspected in readable views;
- Re-run past experiments, cloning or modifying the parameters used in the past;
- Configurable inference timeout;
- Custom default parameters and system prompts can be defined in settings:

[<img src="./screenshots/settings.png?raw=true" alt="Settings" width="720">](./screenshots/settings.png?raw=true)

## Grid Search (or something similar...)

Technically, the term "grid search" refers to iterating over a series of different model hyperparams to optimize model performance, but that usually means parameters like `batch_size`, `learning_rate`, or `number_of_epochs`, more commonly used in training.

But the concept here is similar:

Lets define a selection of models, a prompt and some parameter combinations:

[<img src="./screenshots/gridparams-animation.gif?raw=true" alt="gridparams" width="400">](./screenshots/gridparams-animation.gif?raw=true)

The prompt will be submitted once for each parameter **value**, for each one of the selected models, generating a set of responses.


## A/B Testing

Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations:

[<img src="./screenshots/ab-animation.gif?raw=true" alt="A/B testing" width="720">](./screenshots/ab-animation.gif?raw=true)

<small>Comparing the results of different prompts for the same model</small>

## Prompt Archive
You can save and manage your prompts (we want to make prompts compatible with [Open WebUI](https://github.com/open-webui/open-webui))

[<img src="./screenshots/prompt-archive.png?raw=true" alt="Settings" width="720">](./screenshots/prompt-archive.png?raw=true)

You can **autocomplete** prompts by typing "/" (inspired by Open WebUI, as well):

[<img src="./screenshots/autocomplete.gif?raw=true" alt="A/B testing" width="720">](./screenshots/autocomplete.gif?raw=true)



## Experiment Logs

You can list, inspect, or download your experiments:

[<img src="./screenshots/experiments.png?raw=true" alt="Settings" width="720">](./screenshots/experiments.png?raw=true)

## Future Features

- Grading results and filtering by grade
- Importing, exporting and sharing prompt lists and experiment parameters.

## Contributing

- For obvious bugs and spelling mistakes, please go ahead and submit a PR.

- If you want to propose a new feature, change existing functionality, or propose anything more complex, please open an issue for discussion, **before** getting work done on a PR.

## Development

1. Make sure you have Rust installed.

2. Clone the repository (or a fork)

```sh
git clone https://github.com/dezoito/ollama-grid-search.git
cd ollama-grid-search
```

3. Install the frontend dependencies.

```sh
cd <project root>
# I'm using bun to manage dependencies,
# but feel free to use yarn or npm
bun install
```

4. Make sure `rust-analyzer` is configured to run `Clippy` when checking code.

If you are running VS Code, add this to your `settings.json` file

```
{
...
"rust-analyzer.check.command": "clippy",
}
```

(or, better yet, just use the settings file provided with the code)

5. Run the app in development mode
```sh
cd <project root>/
bun tauri dev
```
6. Go grab a cup of coffee because this may take a while.

## Citations

The following works and theses have cited this repository:

Inouye, D & Lindo, L, & Lee, R & Allen, E; Computer Science and Engineering Senior Theses: **Applied Auto-tuning on LoRA Hyperparameters**
Santa Clara University, 2024
<https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1271&context=cseng_senior>

## Thank you!

Huge thanks to [@FabianLars](https://github.com/FabianLars), [@peperroni21](https://github.com/pepperoni21) and [@TomReidNZ](https://github.com/TomReidNZ).
Loading