The FMBench Orchestrator
automates the LLM benchmarking. It is built with modular design where users can plug and play with any combination of dataset, models, serving stacks, and benchmark metrics:
- Overview
- How does it work?
- Prerequisites
- Install FMBench-Orchestrator
- Try it out
- How do I ...
- Benchmark for EC2
- Benchmark for Sagemaker
- Benchmark for Bedrock
- Run cost / performance comparison between different types of EC2 instances
- Run cost / performance comparison between SageMaker and EC2
- Run cost / performance comparison between bedrock and sagemaker
- Use custom datasets
- Use an existing
FMBench
config file but modify it slightly for my requirements - Compare the accuracy when predicting on custom dataset
- Provide a custom prompt/custom tokenizer for my benchmarking test
- Benchmark multiple config files on the same EC2 instance
- Architecture of Fmbench Orchestrator
+---------------------------+
| Initialization |
| (Configure & Setup) |
+---------------------------+
↓
+---------------------------+
| Instance Creation |
| (Launch EC2 Instances) |
+---------------------------+
↓
+---------------------------+
| FMBENCH Execution |
| (Run Benchmark Script) |
+---------------------------+
↓
+---------------------------+
| Results Collection |
| (Download from instances) |
+---------------------------+
↓
+---------------------------+
| Instance Termination |
| (Terminate Instances) |
+---------------------------+
-
IAM ROLE: You need an active AWS account having an IAM Role: Need necessary permissions to create, manage, and terminate EC2 instances. See this link for the permissions and trust policies that this IAM role needs to have. Call this IAM role as
fmbench-orchestrator
. -
Service quota: Your AWS account needs to have enough VCPU quota to launch the Amazon EC2 instances if your LLM serving stack is EC2. In case you need to request a quota increase, please refer to this link. This would usually mean increasing the CPU limits for your accounts, getting quota for certain instance types etc.
-
An Orchestrator EC2 Instance: It is recommended to run the orchestrator on an EC2 instance preferably located in the same AWS region where you plan to host your LLM (although launching instances across regions is supported as well).
- Use
Ubuntu
as the instance OS, specifically theubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20240927
AMI. - Use
t3.xlarge
as the instance type with preferably at least 100GB of disk space. - Associate the
fmbench-orchestrator
IAM role with this instance.
- Use
-
Clone the Repository
git clone https://github.com/awslabs/fmbench-orchestrator.git cd fmbench-orchestrator
-
Install
uv
curl -LsSf https://astral.sh/uv/install.sh | sh export PATH="$HOME/.local/bin:$PATH" uv venv && source .venv/bin/activate && uv pip sync pyproject.toml UV_PROJECT_ENVIRONMENT=.venv uv add zmq python -m ipykernel install --user --name=.venv --display-name="Python (uv env)"
-
Hugging Face token: Please follow the instructions here to get a Hugging Face token. Please also make sure to get access to the models in HuggingFace. Most models and tokenizers are downloaded from Hugging Face, to enable this place your Hugging Face token in
/tmp/hf_token.txt
.# replace with your Hugging Face token hf_token=your-hugging-face-token echo $hf_token > /tmp/hf_token.txt
Follow the below steps to get started with benchmarking Meta Llama 3 8b on a g6e.2xlarge and g6e.4xlarge in under 30 minutes.
In this example, we compare the cost and performance of hosting Llama3.1-8b on EC2 g6e.2xlarge and g6e.4xlarge.
python main.py --config-file configs/ec2.yml
Here is a description of all the command line parameters that are supported by the orchestrator:
- --config-file - required, path to the orchestrator configuration file.
- --ami-mapping-file - optional, default=ami_mapping.yml, path to a config file containing the region->instance type->AMI mapping
- --fmbench-config-file - optional, config file to use with
FMBench
, this is used if the orchestrator config file uses the "{{config_file}}" format for specifying theFMBench
config file. If you are benchmarking on SageMaker or Bedrock then parameter does need to be specified. - --infra-config-file - optional, default=infra.yml, config file to use with AWS infrastructure
- --write-bucket - optional, default=placeholder, this parameter is only needed when benchmarking on SageMaker*, Amazon S3 bucket to store model files for benchmarking on SageMaker.
- --fmbench-latest - optional, default=no, this parameter downloads and installs the latest version of
FMBench
from the GitHub repo rather than the latest released version from PyPi. - --fmbench-repo - optional, default=None, GitHub repo for FMBench (such as https://github.com/aws-samples/foundation-model-benchmarking-tool.git), if set then then this repo is used for installing FMBench rather than doing an FMBench install from PyPI. Default is None i.e. use FMBench package from PyPi.
After the run is complete, you can generate analysis reports from the above experiments:
python analytics/analytics.py --results-dir results/llama3-8b-g6e --model-id llama3-8b --payload-file payload_en_3000-3840.jsonl --latency-threshold 2
The results are saved in fmbench-orchestrator/analytics/results/llama3-8b-g6e/
on your orchestrator EC2 instance, including summarization of the results, a heatmap that helps understand which instance type gives the best price performance at the desired scale (transactions/minute), etc.
Below is one of the output tables about cost comparison.
The experiment configurations are specified in the config YML file, in the instances
section. FMbench Orchestrator will run each experiment in parallel, and then collect the results from each experiment onto the orchestrator EC2 instance. See configuration guide for details on the orchestrator config file.
Take an existing config file from the configs
folder, create a copy and edit it as needed. You would typically only need to modify the instances
section of the config file to either modify the instance type and config file or add additional types. For example the following command line benchmarks the Llama3.1-8b
models on g6e
EC2 instance types.
python main.py --config-file configs/ec2.yml
You can benchmark any model(s) on Amazon SageMaker by simply pointing the orchestrator to the desired FMBench
SageMaker config file. The orchestrator will create an EC2 instance and use that for running FMBench
benchmarking for SageMaker. For example the following command line benchmarks the Llama3.1-8b
models on ml.g5
instance types on SageMaker.
# provide the name of an S3 bucket in which you want
# SageMaker to store the model files (for models downloaded
# from Hugging Face)
write_bucket=your-bucket-name
python main.py --config-file configs/sagemaker.yml --fmbench-config-file fmbench:llama3.1/8b/config-llama3.1-8b-g5.2xl-g5.4xl-sm.yml --write-bucket $write_bucket
You can benchmark any model(s) on Amazon Bedrock by simply pointing the orchestrator to the desired FMBench
SageMaker config file. The orchestrator will create an EC2 instance and use that for running FMBench
benchmarking for Bedrock. For example the following command line benchmarks the Llama3.1
models on Bedrock.
python main.py --config-file configs/bedrock.yml --fmbench-config-file fmbench:bedrock/config-bedrock-llama3-1.yml
See configs/ec2.yml
as an example for EC2 experiments. The instances
section has 2 experiments, one using g6e.2xlarge and the other using g6e.4xlarge.
instances:
- instance_type: g6e.2xlarge
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
- instance_type: g6e.4xlarge
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-llama3-8b-g6e.4xl-tp-1-mc-max-djl-ec2.yml
Note that the fmbench: lama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
and fmbench: llama3/8b/config-llama3-8b-g6e.4xl-tp-1-mc-max-djl-ec2.yml
files are default config files provided in the FMbench repo. FMbench orchestrator use these config to launch EC2 instance and deploy the experiments on the launched EC2 instance.
An example of using customized fmbench
config file is given in the Compare SageMaker against EC2 section below.
LLM can be hosted on an SageMaker endpoint. This experiment requires the SageMaker endpoint already deployed.
You first need to write a FMBench
config file for SageMaker. One option is to make a copy of config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
, and modify the values in the experiments
section, such as the endpoint_name
, instance_type
and model_id
. Then upload the edited config to your orchestrator EC2 instance.
The orchestrator config YML file should have the following:
instances:
- instance_type: m7a.xlarge # SageMaker experiment
<<: *ec2_settings
fmbench_config:
- PATH/TO/YOUR/edited_config.yml
- instance_type: g6e.2xlarge # EC2 experiment
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
See configs/bedrock.yml
as an example for Bedrock experiments.
instances:
- instance_type: m7a.xlarge # Bedrock experiment
<<: *ec2_settings
fmbench_config:
- fmbench:bedrock/config-bedrock-llama3-1.yml
- instance_type: m7a.xlarge # SageMaker experiment
<<: *ec2_settings
fmbench_config:
- ~/fmbench-orchestrator/configs/byoe/config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
The FMBench
config file for Bedrock is fmbench:bedrock/config-bedrock-llama3-1.yml
. You can also customize this config and upload your .yml file to the orchestrator EC2 instance.
Please see ec2_custom_dataset.yml
for an example config file. The custom data is uploaded to the ~/fmbench-orchestrator/byo_dataset
folder on the orchestrator EC2 instance, specified in the upload_files
section.
instances:
- instance_type: g6e.2xlarge
<<: *ec2_settings
fmbench_config:
- /home/ubuntu/fmbench-orchestrator/byo_fmbench_configs/config-ec2-llama3-8b-g6e-2xlarge_eval.yml
upload_files:
- local: byo_dataset/custom.jsonl ## your custom dataset
remote: /tmp/fmbench-read/source_data/
- local: analytics/pricing.yml
remote: /tmp/fmbench-read/configs/
Please see the custom.jsonl file for data format example. Note that language needs to be set to 'en' to be compatible with the default config files.
-
Download an
FMBench
config file from theFMBench repo
and place it in theconfigs/fmbench
folder. -
Modify the downloaded config as needed.
-
Update the
instance -> fmench_config
section for the instance that needs to use this file to point to the updated config file infmbench/configs
so for example if the updated config file wasconfig-ec2-llama3-8b-g6e-2xlarge-custom.yml
then the following parameter:fmbench_config: - fmbench:llama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
would be changed to:
fmbench_config: - configs/fmbench/config-ec2-llama3-8b-g6e-2xlarge-custom.yml
The orchestrator would now upload the custom config on the EC2 instance being used for benchmarking.
FMBench-orchestrator supports for evaluating candidate models using Majority Voting with a Panel of LLM Evaluators (PoLL). Before running the experiment, please enable model access in Bedrock to the judge models: Llama3-70b, Cohere command-r-v1 and claude 3 Sonnet.
First, create a config file specifying accuracy measurement related info, such as ground_truth
, question_col_key
. You can copy config-llama3.1-8b-g5.2xl-g5.4xl-sm.yml as an example, and modify based on your experiment.
Here are the parameters to update in this config file:
run_steps:
0_setup.ipynb: yes
1_generate_data.ipynb: yes
2_deploy_model.ipynb: yes
3_run_inference.ipynb: yes
4_get_evaluations.ipynb: yes # Make sure to set this step to "yes".
5_model_metric_analysis.ipynb: yes
6_cleanup.ipynb: yes
datasets:
prompt_template_keys:
- input
- context
ground_truth_col_key: answers # The name of the answer field in your custom data
question_col_key: input # The name of the question field in your custom data
The instances
section has an upload_files
section for each instance where we can provide a list of local
files and remote
directory paths to place any custom file on an EC2 instance. This could be a tokenizer.json
file, a custom prompt file, or a custom dataset. The example below shows how to upload a custom pricing.yml
and a custom dataset to an EC2 instance.
instances:
- instance_type: g6e.2xlarge
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
upload_files:
- local: byo_dataset/custom.jsonl
remote: /tmp/fmbench-read/source_data/
- local: analytics/pricing.yml
remote: /tmp/fmbench-read/configs/
See ec2_llama3.2-1b-cpu-byodataset.yml
for an example config file. This file refers to the synthetic_data_large_prompts
and a custom prompt file prompt_template_llama3_summarization.txt
for a summarization task. You can edit the dataset file and the prompt template as per your requirements.
Often times we want to benchmark different combinations of parameters on the same EC2 instance, for example we may want to test tensor parallelism degree of 2, 4 and 8 for say Llama3.1-8b
model on the same EC2 machine say g6e.48xlarge
. Can do that easily with the orchestrator by specifying a list of config files rather than just a single config file as shown in the following example:
fmbench_config:
- fmbench:llama3.1/8b/config-llama3.1-8b-g6e.48xl-tp-2-mc-max-djl.yml
- fmbench:llama3.1/8b/config-llama3.1-8b-g6e.48xl-tp-4-mc-max-djl.yml
- fmbench:llama3.1/8b/config-llama3.1-8b-g6e.48xl-tp-8-mc-max-djl.yml
The orchestrator would in this case first run benchmarking for the first file in the list, and then on the same EC2 instance run benchmarking for the second file and so on and so forth. The results folders and fmbench.log
files for each of the runs is downloaded at the end when all config files for that instance have been processed.
Below is the conceptual architecture of the FMBench Orchestrator.
See CONTRIBUTING for more information.
This project is licensed under the MIT-0 License - see the LICENSE file for details.
Contributions are welcome! Please fork the repository and submit a pull request with your changes. For major changes, please open an issue first to discuss what you would like to change.