forked from skeskinen/bert.cpp
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
39 changed files
with
2,386 additions
and
97 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# bert.cpp | ||
|
||
[ggml](https://github.com/ggerganov/ggml) inference of BERT neural net architecture with pooling and normalization from [SentenceTransformers (sbert.net)](https://sbert.net/). | ||
High quality sentence embeddings in pure C++ (or C). | ||
|
||
## Description | ||
The main goal of `bert.cpp` is to run the BERT model using 4-bit integer quantization on CPU | ||
|
||
* Plain C/C++ implementation without dependencies | ||
* Inherit support for various architectures from ggml (x86 with AVX2, ARM, etc.) | ||
* Choose your model size from 32/16/4 bits per model weigth | ||
* all-MiniLM-L6-v2 with 4bit quantization is only 14MB. Inference RAM usage depends on the length of the input | ||
* Sample cpp server over tcp socket and a python test client | ||
* Benchmarks to validate correctness and speed of inference | ||
|
||
## Limitations & TODO | ||
* Tokenizer doesn't correctly handle asian writing (CJK, maybe others) | ||
* Inputs longer than ctx size are not truncated. If you are trying to make embeddings for longer texts make sure to truncate. | ||
* bert.cpp doesn't respect tokenizer, pooling or normalization settings from the model card: | ||
* All inputs are lowercased and trimmed | ||
* All outputs are mean pooled and normalized | ||
* The API is in C++ (uses things from std::) | ||
|
||
## Usage | ||
|
||
### Build | ||
```sh | ||
mkdir build | ||
cd build | ||
cmake .. | ||
make | ||
cd .. | ||
``` | ||
### Download models | ||
```sh | ||
pip3 install -r requirements.txt | ||
# python3 models/download-ggml.py list_models | ||
python3 models/download-ggml.py download all-MiniLM-L6-v2 q4_0 | ||
``` | ||
### Start sample server | ||
```sh | ||
./build/bin/server -m models/all-MiniLM-L6-v2/ggml-model-q4_0.bin | ||
|
||
# bert_model_load: loading model from 'models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ... | ||
# bert_model_load: n_vocab = 30522 | ||
# bert_model_load: n_ctx = 512 | ||
# bert_model_load: n_embd = 384 | ||
# bert_model_load: n_intermediate = 1536 | ||
# bert_model_load: n_head = 12 | ||
# bert_model_load: n_layer = 6 | ||
# bert_model_load: f16 = 2 | ||
# bert_model_load: ggml ctx size = 13.57 MB | ||
# bert_model_load: ............ done | ||
# bert_model_load: model size = 13.55 MB / num tensors = 101 | ||
# Server running on port 8080 with 4 threads | ||
``` | ||
### Run sample client | ||
|
||
|
||
|
||
|
||
## Benchmarks | ||
Running MTEB (Massive Text Embedding Benchmark) with bert.cpp vs. [sbert](https://sbert.net/)(cpu mode) gives comparable results between the two, with quantization having minimal effect on accuracy and eval time being similar or better than sbert with batch_size=1 (bert.cpp doesn't support batching). | ||
|
||
See [benchmarks](benchmarks) more info. | ||
### all-MiniLM-L6-v2 | ||
| Data Type | STSBenchmark | eval time | EmotionClassification | eval time | | ||
|-----------|-----------|------------|-----------|------------| | ||
| f16 | 0.8201 | 7.52 | 0.4085 | 12.25 | | ||
| f32 | 0.8201 | 8.22 | 0.4082 | 13.65 | | ||
| q4_0 | 0.8175 | 6.87 | 0.3911 | 11.22 | | ||
| q4_1 | 0.8214 | 13.26 | 0.4015 | 21.37 | | ||
| sbert | 0.8203 | 2.85 | 0.4085 | 7.28 | | ||
| sbert-batchless | 0.8203 | 12.48 | 0.4085 | 15.27 | | ||
|
||
|
||
### all-MiniLM-L12-v2 | ||
| Data Type | STSBenchmark | eval time | EmotionClassification | eval time | | ||
|-----------|-----------|------------|-----------|------------| | ||
| f16 | 0.8306 | 14.66 | 0.4119 | 23.20 | | ||
| f32 | 0.8306 | 16.18 | 0.4117 | 25.79 | | ||
| q4_0 | 0.8310 | 13.31 | 0.4183 | 21.54 | | ||
| q4_1 | 0.8202 | 25.48 | 0.4010 | 41.75 | | ||
| sbert | 0.8309 | 4.98 | 0.4117 | 10.45 | | ||
| sbert-batchless | 0.8309 | 22.22 | 0.4117 | 26.53 | | ||
|
||
### bert-base-uncased | ||
bert-base-uncased is not a very good sentence embeddings model, but it's here to show that bert.cpp correctly runs models that are not from SentenceTransformers. Technically any hf model with architecture `BertModel` or `BertForMaskedLM` should work. | ||
| Data Type | STSBenchmark | eval time | EmotionClassification | eval time | | ||
|-----------|-----------|------------|-----------|------------| | ||
| f16 | 0.4739 | 37.68 | 0.3361 | 61.54 | | ||
| f32 | 0.4738 | 57.90 | 0.3361 | 91.37 | | ||
| q4_0 | 0.4940 | 39.21 | 0.3375 | 65.11 | | ||
| q4_1 | 0.4681 | 85.11 | 0.3268 | 144.11 | | ||
| sbert | 0.4729 | 16.71 | 0.3527 | 30.03 | | ||
| sbert-batchless | 0.4729 | 67.12 | 0.3526 | 77.83 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
Use `run_mteb.py` to run mteb embeddings benchmark for each model. The script will start the c++ server for each different size of model, so make sure you have all 4 sizes in your models directory. It will also run the benchmarks with SentenceTransformers library to get a baseline results. | ||
|
||
The ggml version doesn't have batching so it is at a disadvantage compared to sbert where all the computations are done in batches of 64 input sentences. But if batching is not possible in your application (e.g. the inputs are given by the user) then the batchless performance is more relevant. sbert-batchless runs the benchmark with SentenceTransformers library with `batch_size=1` | ||
|
||
Note that the sbert results here are with CPU. Sbert also supports GPU inference, and in that case it would be much faster. | ||
|
||
Use `print_tables.py` to format the results like the following tables. | ||
|
||
### all-MiniLM-L6-v2 | ||
| Data Type | STSBenchmark | eval time | EmotionClassification | eval time | | ||
|-----------|-----------|------------|-----------|------------| | ||
| f16 | 0.8201 | 7.52 | 0.4085 | 12.25 | | ||
| f32 | 0.8201 | 8.22 | 0.4082 | 13.65 | | ||
| q4_0 | 0.8175 | 6.87 | 0.3911 | 11.22 | | ||
| q4_1 | 0.8214 | 13.26 | 0.4015 | 21.37 | | ||
| sbert | 0.8203 | 2.85 | 0.4085 | 7.28 | | ||
| sbert-batchless | 0.8203 | 12.48 | 0.4085 | 15.27 | | ||
|
||
|
||
### all-MiniLM-L12-v2 | ||
| Data Type | STSBenchmark | eval time | EmotionClassification | eval time | | ||
|-----------|-----------|------------|-----------|------------| | ||
| f16 | 0.8306 | 14.66 | 0.4119 | 23.20 | | ||
| f32 | 0.8306 | 16.18 | 0.4117 | 25.79 | | ||
| q4_0 | 0.8310 | 13.31 | 0.4183 | 21.54 | | ||
| q4_1 | 0.8202 | 25.48 | 0.4010 | 41.75 | | ||
| sbert | 0.8309 | 4.98 | 0.4117 | 10.45 | | ||
| sbert-batchless | 0.8309 | 22.22 | 0.4117 | 26.53 | | ||
|
||
|
||
### bert-base-uncased | ||
For bert-base-uncased, the pooling and normalization are different from the ones used in the actual model. I think that's why ggml scores better than sbert in STSBenchmark and worse in EmotionClassification | ||
| Data Type | STSBenchmark | eval time | EmotionClassification | eval time | | ||
|-----------|-----------|------------|-----------|------------| | ||
| f16 | 0.4739 | 37.68 | 0.3361 | 61.54 | | ||
| f32 | 0.4738 | 57.90 | 0.3361 | 91.37 | | ||
| q4_0 | 0.4940 | 39.21 | 0.3375 | 65.11 | | ||
| q4_1 | 0.4681 | 85.11 | 0.3268 | 144.11 | | ||
| sbert | 0.4729 | 16.71 | 0.3527 | 30.03 | | ||
| sbert-batchless | 0.4729 | 67.12 | 0.3526 | 77.83 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
import os | ||
import json | ||
|
||
RESULTS_DIR = "results" | ||
BENCHMARKS = ["STSBenchmark", "EmotionClassification"] | ||
DATA_TYPES = ["f16", "f32", "q4_0", "q4_1", "sbert", "sbert-batchless"] | ||
|
||
# Define a dictionary to store the results | ||
results_dict = {} | ||
|
||
# Loop over all the directories and extract the models | ||
models = set() | ||
for dir_name in os.listdir(RESULTS_DIR): | ||
m = dir_name.split("_")[0] | ||
models.add(m) | ||
|
||
def extract_results(test_data): | ||
res = {"time": test_data["evaluation_time"]} | ||
if "cos_sim" in test_data and "spearman" in test_data["cos_sim"]: | ||
res['score'] = test_data["cos_sim"]["spearman"] | ||
elif "main_score" in test_data: | ||
res['score'] = test_data["main_score"] | ||
else: | ||
print(f"can't extract results {test_data}") | ||
return res | ||
|
||
for model in models: | ||
model_results = {} | ||
for data_type in DATA_TYPES: | ||
dir_name = f"{RESULTS_DIR}/{model}_{data_type}" | ||
if not os.path.isdir(dir_name): | ||
print(f"{dir_name} doesn't exist!") | ||
continue | ||
data_type_results = {} | ||
for benchmark in BENCHMARKS: | ||
results_path = os.path.join(dir_name, f"{benchmark}.json") | ||
with open(results_path, "r") as f: | ||
results = json.load(f) | ||
|
||
data_type_results[benchmark] = extract_results(results['test']) | ||
|
||
model_results[data_type] = data_type_results | ||
results_dict[model] = model_results | ||
|
||
# Print the results as an .md table for each model | ||
for model, model_results in results_dict.items(): | ||
print(f"### {model}") | ||
print("| Data Type | ", end="") | ||
for benchmark in BENCHMARKS: | ||
print(f"{benchmark} | eval time | ", end="") | ||
print() | ||
print("|-----------|", end="") | ||
for _ in BENCHMARKS: | ||
print("-----------|------------|", end="") | ||
print() | ||
for data_type in DATA_TYPES: | ||
print(f"| {data_type} | ", end="") | ||
for benchmark in BENCHMARKS: | ||
results = model_results[data_type][benchmark] | ||
print(f"{results['score']:.4f} | {results['time']:.2f} | ", end="") | ||
print() | ||
print("\n") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
mteb | ||
sentence_transformers |
13 changes: 13 additions & 0 deletions
13
benchmarks/results/all-MiniLM-L12-v2_f16/EmotionClassification.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_revision": "4f58c6b202a23cf9a4da393831edf4f9183cad37", | ||
"mteb_dataset_name": "EmotionClassification", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"accuracy": 0.4119499999999999, | ||
"accuracy_stderr": 0.025105228539091216, | ||
"evaluation_time": 23.2, | ||
"f1": 0.36981414412336655, | ||
"f1_stderr": 0.02094871267575925, | ||
"main_score": 0.4119499999999999 | ||
} | ||
} |
20 changes: 20 additions & 0 deletions
20
benchmarks/results/all-MiniLM-L12-v2_f16/STSBenchmark.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"dataset_revision": "b0fddb56ed78048fa8b90373c8a3cfc37b684831", | ||
"mteb_dataset_name": "STSBenchmark", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"cos_sim": { | ||
"pearson": 0.8374641693018909, | ||
"spearman": 0.8305896485864188 | ||
}, | ||
"euclidean": { | ||
"pearson": 0.8350326075472255, | ||
"spearman": 0.8305896485864188 | ||
}, | ||
"evaluation_time": 14.66, | ||
"manhattan": { | ||
"pearson": 0.8351482035115159, | ||
"spearman": 0.8308811375478211 | ||
} | ||
} | ||
} |
13 changes: 13 additions & 0 deletions
13
benchmarks/results/all-MiniLM-L12-v2_f32/EmotionClassification.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_revision": "4f58c6b202a23cf9a4da393831edf4f9183cad37", | ||
"mteb_dataset_name": "EmotionClassification", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"accuracy": 0.41174999999999995, | ||
"accuracy_stderr": 0.02517364693484041, | ||
"evaluation_time": 25.79, | ||
"f1": 0.36964632574873646, | ||
"f1_stderr": 0.02101215083642815, | ||
"main_score": 0.41174999999999995 | ||
} | ||
} |
20 changes: 20 additions & 0 deletions
20
benchmarks/results/all-MiniLM-L12-v2_f32/STSBenchmark.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"dataset_revision": "b0fddb56ed78048fa8b90373c8a3cfc37b684831", | ||
"mteb_dataset_name": "STSBenchmark", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"cos_sim": { | ||
"pearson": 0.837465240168285, | ||
"spearman": 0.8305951440128178 | ||
}, | ||
"euclidean": { | ||
"pearson": 0.835033461743598, | ||
"spearman": 0.8305951440128178 | ||
}, | ||
"evaluation_time": 16.18, | ||
"manhattan": { | ||
"pearson": 0.8351470693555814, | ||
"spearman": 0.8308846560867743 | ||
} | ||
} | ||
} |
13 changes: 13 additions & 0 deletions
13
benchmarks/results/all-MiniLM-L12-v2_q4_0/EmotionClassification.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_revision": "4f58c6b202a23cf9a4da393831edf4f9183cad37", | ||
"mteb_dataset_name": "EmotionClassification", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"accuracy": 0.4183, | ||
"accuracy_stderr": 0.021613884426451443, | ||
"evaluation_time": 21.54, | ||
"f1": 0.37624466895950653, | ||
"f1_stderr": 0.01743903163262402, | ||
"main_score": 0.4183 | ||
} | ||
} |
20 changes: 20 additions & 0 deletions
20
benchmarks/results/all-MiniLM-L12-v2_q4_0/STSBenchmark.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"dataset_revision": "b0fddb56ed78048fa8b90373c8a3cfc37b684831", | ||
"mteb_dataset_name": "STSBenchmark", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"cos_sim": { | ||
"pearson": 0.8365276911292119, | ||
"spearman": 0.8309588798492489 | ||
}, | ||
"euclidean": { | ||
"pearson": 0.8372279220677411, | ||
"spearman": 0.8309588798492489 | ||
}, | ||
"evaluation_time": 13.31, | ||
"manhattan": { | ||
"pearson": 0.8368693263995872, | ||
"spearman": 0.8306785947771824 | ||
} | ||
} | ||
} |
13 changes: 13 additions & 0 deletions
13
benchmarks/results/all-MiniLM-L12-v2_q4_1/EmotionClassification.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_revision": "4f58c6b202a23cf9a4da393831edf4f9183cad37", | ||
"mteb_dataset_name": "EmotionClassification", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"accuracy": 0.40095000000000003, | ||
"accuracy_stderr": 0.02566266743734953, | ||
"evaluation_time": 41.75, | ||
"f1": 0.3626628620864726, | ||
"f1_stderr": 0.018959571169492463, | ||
"main_score": 0.40095000000000003 | ||
} | ||
} |
20 changes: 20 additions & 0 deletions
20
benchmarks/results/all-MiniLM-L12-v2_q4_1/STSBenchmark.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"dataset_revision": "b0fddb56ed78048fa8b90373c8a3cfc37b684831", | ||
"mteb_dataset_name": "STSBenchmark", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"cos_sim": { | ||
"pearson": 0.8300376055771063, | ||
"spearman": 0.8202182350295162 | ||
}, | ||
"euclidean": { | ||
"pearson": 0.8281548958602518, | ||
"spearman": 0.8202182350295162 | ||
}, | ||
"evaluation_time": 25.48, | ||
"manhattan": { | ||
"pearson": 0.8272951345188557, | ||
"spearman": 0.819294554414274 | ||
} | ||
} | ||
} |
13 changes: 13 additions & 0 deletions
13
benchmarks/results/all-MiniLM-L12-v2_sbert-batchless/EmotionClassification.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_revision": "4f58c6b202a23cf9a4da393831edf4f9183cad37", | ||
"mteb_dataset_name": "EmotionClassification", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"accuracy": 0.4117, | ||
"accuracy_stderr": 0.025096015620014265, | ||
"evaluation_time": 26.53, | ||
"f1": 0.3696192637393597, | ||
"f1_stderr": 0.020941989472486138, | ||
"main_score": 0.4117 | ||
} | ||
} |
20 changes: 20 additions & 0 deletions
20
benchmarks/results/all-MiniLM-L12-v2_sbert-batchless/STSBenchmark.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"dataset_revision": "b0fddb56ed78048fa8b90373c8a3cfc37b684831", | ||
"mteb_dataset_name": "STSBenchmark", | ||
"mteb_version": "1.0.2", | ||
"test": { | ||
"cos_sim": { | ||
"pearson": 0.837594560292421, | ||
"spearman": 0.8308938533093635 | ||
}, | ||
"euclidean": { | ||
"pearson": 0.8355879778009024, | ||
"spearman": 0.8308938533093635 | ||
}, | ||
"evaluation_time": 22.22, | ||
"manhattan": { | ||
"pearson": 0.8356896375814314, | ||
"spearman": 0.8311516183577004 | ||
} | ||
} | ||
} |
Oops, something went wrong.