From ef63d5deaab529ff4b68a13a4456a801aa3871c4 Mon Sep 17 00:00:00 2001
From: youran-qi <youran@cohere.com>
Date: Wed, 30 Oct 2024 20:59:53 -0400
Subject: [PATCH] update README.md

---
 README.md | 70 +++++++++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/README.md b/README.md
index e6a9d32..ed59e89 100644
--- a/README.md
+++ b/README.md
@@ -2,10 +2,14 @@
 Cohere-finetune is a tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's family of Command R models on users' own data to serve their own use cases.
 
 Currently, we support the following base models for fine-tuning:
-- [Cohere's Command R 08-2024 in HuggingFace](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024)
-- [Cohere's Command R Plus 08-2024 in HuggingFace](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
 - [Cohere's Command R in HuggingFace](https://huggingface.co/CohereForAI/c4ai-command-r-v01)
+- [Cohere's Command R 08-2024 in HuggingFace](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024)
 - [Cohere's Command R Plus in HuggingFace](https://huggingface.co/CohereForAI/c4ai-command-r-plus)
+- [Cohere's Command R Plus 08-2024 in HuggingFace](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
+- [Cohere's Aya Expanse 8B in HuggingFace](https://huggingface.co/CohereForAI/aya-expanse-8b)
+- [Cohere's Aya Expanse 32B in HuggingFace](https://huggingface.co/CohereForAI/aya-expanse-32b)
+
+We also support any customized base model built on one of these supported models (see [Step 4](#step-4-submit-the-request-to-start-the-fine-tuning) for more details).
 
 Currently, we support the following fine-tuning strategies:
 - [Parameter efficient fine-tuning by LoRA](https://arxiv.org/pdf/2106.09685)
@@ -19,12 +23,12 @@ We will keep extending the base models and fine-tuning strategies we support, an
 
 To help you better decide the hardware resources you need, we list some feasible scenarios in the following table as a reference, where all the other hyperparameters that are not shown in the table are set as their default values (see [here](#step-4-submit-the-request-to-start-the-fine-tuning)).
 
-| Hardware resources | Base model                               | Finetune strategy | Batch size | Max sequence length |
-|:-------------------|:-----------------------------------------|:------------------|:-----------|:--------------------|
-| 8 * 80GB H100 GPUs | Command R 08-2024 or Command R           | LoRA or QLoRA     | 8          | 16384               |
-| 8 * 80GB H100 GPUs | Command R 08-2024 or Command R           | LoRA or QLoRA     | 16         | 8192                |
-| 8 * 80GB H100 GPUs | Command R Plus 08-2024 or Command R Plus | LoRA or QLoRA     | 8          | 8192                |
-| 8 * 80GB H100 GPUs | Command R Plus 08-2024 or Command R Plus | LoRA or QLoRA     | 16         | 4096                |
+| Hardware resources | Base model                                                    | Finetune strategy | Batch size | Max sequence length |
+|:-------------------|:--------------------------------------------------------------|:------------------|:-----------|:--------------------|
+| 8 * 80GB H100 GPUs | Command R, Command R 08-2024, Aya Expanse 8B, Aya Expanse 32B | LoRA or QLoRA     | 8          | 16384               |
+| 8 * 80GB H100 GPUs | Command R, Command R 08-2024, Aya Expanse 8B, Aya Expanse 32B | LoRA or QLoRA     | 16         | 8192                |
+| 8 * 80GB H100 GPUs | Command R Plus, Command R Plus 08-2024                        | LoRA or QLoRA     | 8          | 8192                |
+| 8 * 80GB H100 GPUs | Command R Plus, Command R Plus 08-2024                        | LoRA or QLoRA     | 16         | 4096                |
 
 ## 2. Setup
 Run the commands below on the GPU machine.
@@ -97,7 +101,7 @@ curl --request POST http://localhost:5000/finetune \
     --header "Content-Type: application/json" \
     --data '{
         "finetune_name": "<finetune_name>",
-        "base_model": "command-r-08-2024",
+        "base_model_name_or_path": "command-r-08-2024",
         "parallel_strategy": "fsdp",
         "finetune_strategy": "lora",
         "use_4bit_quantization": "false",
@@ -115,31 +119,37 @@ curl --request POST http://localhost:5000/finetune \
 
 The `<finetune_name>` must be exactly the same as that used in [Step 3](#step-3-prepare-the-training-and-evaluation-data). If you are not going to use Weights & Biases for logging during the fine-tuning, the hyperparameter `"wandb_config"` can be removed. See table below for details about all the other hyperparameters we support, where some valid values or ranges below are based on best practices (you do not have to strictly follow them, but if you do not follow them, some validation codes also need to be changed or removed).
 
-| Hyperparameter              | Definition                                                                                               | Default value        | Valid values or range                                                        |
-|:----------------------------|:---------------------------------------------------------------------------------------------------------|:---------------------|:-----------------------------------------------------------------------------|
-| base_model                  | The name of the base model you will train                                                                | "command-r-08-2024"  | "command-r", "command-r-08-2024", "command-r-plus", "command-r-plus-08-2024" |
-| parallel_strategy           | The strategy to use multiple GPUs for training                                                           | "fsdp"               | "vanilla", "fsdp", "deepspeed"                                               |
-| finetune_strategy           | The strategy to train the model                                                                          | "lora"               | "lora"                                                                       |
-| use_4bit_quantization       | Whether to apply 4-bit quantization to the model                                                         | "false"              | "false", "true"                                                              |
-| gradient_checkpointing      | Whether to use gradient (activation) checkpointing                                                       | "true"               | "false", "true"                                                              |
-| gradient_accumulation_steps | The gradient accumulation steps                                                                          | 1                    | integers, min: 1                                                             |
-| train_epochs                | The number of epochs to train                                                                            | 1                    | integers, min: 1, max: 10                                                    |
-| train_batch_size            | The batch size during training                                                                           | 16                   | integers, min: 8, max: 32                                                    |
-| validation_batch_size       | The batch size during validation (evaluation)                                                            | 16                   | integers, min: 8, max: 32                                                    |
-| learning_rate               | The learning rate                                                                                        | 1e-4                 | real numbers, min: 5e-5, max: 0.1                                            |
-| eval_percentage             | The percentage of data split from training data for evaluation (ignored if evaluation data are provided) | 0.2                  | real numbers, min: 0.05, max: 0.5                                            |
-| lora_config.rank            | The rank parameter in LoRA                                                                               | 8                    | integers, min: 8, max: 16                                                    |
-| lora_config.alpha           | The alpha parameter in LoRA                                                                              | 2 * rank             | integers, min: 16, max: 32                                                   |
-| lora_config.target_modules  | The modules to apply LoRA                                                                                | ["q", "k", "v", "o"] | Any non-empty subset of ["q", "k", "v", "o", "ffn_expansion"]                |
-| lora_config.rslora          | Whether to use rank-stabilized LoRA (rsLoRA)                                                             | "true"               | "false", "true"                                                              |
-
-Note that `finetune_strategy = "lora", use_4bit_quantization = "false"` corresponds to the fine-tuning strategy of LoRA, while `finetune_strategy = "lora", use_4bit_quantization = "true"` corresponds to the fine-tuning strategy of QLoRA.
-
-After the fine-tuning is finished, our fine-tuning service will automatically create the following folders for you:
+| Hyperparameter               | Definition                                                                                               | Default value        | Valid values or range                                                                                                                                     |
+|:-----------------------------|:---------------------------------------------------------------------------------------------------------|:---------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------|
+| base_model_name_or_path      | The name of the base model or the path to the checkpoint of a customized base model that you will train  | "command-r-08-2024"  | "command-r", "command-r-08-2024", "command-r-plus", "command-r-plus-08-2024", "aya-expanse-8b", "aya-expanse-32b", "/opt/finetuning/<path_to_checkpoint>" |
+| parallel_strategy            | The strategy to use multiple GPUs for training                                                           | "fsdp"               | "vanilla", "fsdp", "deepspeed"                                                                                                                            |
+| finetune_strategy            | The strategy to train the model                                                                          | "lora"               | "lora"                                                                                                                                                    |
+| use_4bit_quantization        | Whether to apply 4-bit quantization to the model                                                         | "false"              | "false", "true"                                                                                                                                           |
+| gradient_checkpointing       | Whether to use gradient (activation) checkpointing                                                       | "true"               | "false", "true"                                                                                                                                           |
+| gradient_accumulation_steps  | The gradient accumulation steps                                                                          | 1                    | integers, min: 1                                                                                                                                          |
+| train_epochs                 | The number of epochs to train                                                                            | 1                    | integers, min: 1, max: 10                                                                                                                                 |
+| train_batch_size             | The batch size during training                                                                           | 16                   | integers, min: 8, max: 32                                                                                                                                 |
+| validation_batch_size        | The batch size during validation (evaluation)                                                            | 16                   | integers, min: 8, max: 32                                                                                                                                 |
+| learning_rate                | The learning rate                                                                                        | 1e-4                 | real numbers, min: 5e-5, max: 0.1                                                                                                                         |
+| eval_percentage              | The percentage of data split from training data for evaluation (ignored if evaluation data are provided) | 0.2                  | real numbers, min: 0.05, max: 0.5                                                                                                                         |
+| lora_config.rank             | The rank parameter in LoRA                                                                               | 8                    | integers, min: 8, max: 16                                                                                                                                 |
+| lora_config.alpha            | The alpha parameter in LoRA                                                                              | 2 * rank             | integers, min: 16, max: 32                                                                                                                                |
+| lora_config.target_modules   | The modules to apply LoRA                                                                                | ["q", "k", "v", "o"] | Any non-empty subset of ["q", "k", "v", "o", "ffn_expansion"]                                                                                             |
+| lora_config.rslora           | Whether to use rank-stabilized LoRA (rsLoRA)                                                             | "true"               | "false", "true"                                                                                                                                           |
+
+Note that you can set `base_model_name_or_path` as either the name of a supported model or the path to the checkpoint of a customized base model. However, if it is a path, the following requirements must be satisfied:
+- The customized base model must have the same architecture as one of the supported models (the weights can be different). For example, it can be a model obtained by fine-tuning a supported model like "command-r-08-2024".
+- The checkpoint of the customized base model must be a HuggingFace checkpoint like [this](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024/tree/main). It must contain a `config.json` file, as we will use it to infer the type of your model.
+- The checkpoint must be put in `<finetune_root_dir>/<path_to_checkpoint>` on your host, and the `base_model_name_or_path` must be in the format of `/opt/finetuning/<path_to_checkpoint>` (recall that we mount `<finetune_root_dir>` on the host to `/opt/finetuning` in the container).
+
+Also note that `finetune_strategy = "lora", use_4bit_quantization = "false"` corresponds to the fine-tuning strategy of LoRA, while `finetune_strategy = "lora", use_4bit_quantization = "true"` corresponds to the fine-tuning strategy of QLoRA.
+
+After the fine-tuning is finished, you can find all the files about this fine-tuning in `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>`. More specifically, our fine-tuning service will automatically create the following folders for you:
 - `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>/finetune` that stores all the intermediate results generated during fine-tuning, which contains the following sub-folders:
   - `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>/finetune/data` that stores the preprocessed data (the data split into train & eval and rendered by the Cohere template)
   - `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>/finetune/checkpoints` that stores the checkpoints of adapter weights during fine-tuning
   - `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>/finetune/logs` that stores the Weights & Biases logs
+  - `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>/finetune/configs` that stores the configuration files used by fine-tuning
 - `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>/output` that stores the final fine-tuning outputs, which contains the following sub-folder:
   - `<finetune_root_dir>/<finetune_sub_dir>/<finetune_name>/output/merged_weights` that stores the final model weights after we merge the fine-tuned adapter weights into the base model weights