-
Notifications
You must be signed in to change notification settings - Fork 722
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add model 2024-01-01-bge_small_en * Add model 2024-01-01-bge_base_en * Add model 2024-01-01-bge_large_en --------- Co-authored-by: maziyarpanahi <[email protected]>
- Loading branch information
1 parent
9fa1404
commit f38ea17
Showing
3 changed files
with
257 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
--- | ||
layout: model | ||
title: BAAI general embedding English (bge_base) | ||
author: John Snow Labs | ||
name: bge_base | ||
date: 2024-01-01 | ||
tags: [bert, bge, onnx, en, open_source] | ||
task: Embeddings | ||
language: en | ||
edition: Spark NLP 5.2.1 | ||
spark_version: 3.0 | ||
supported: true | ||
engine: onnx | ||
annotator: BGEEmbeddings | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. | ||
And it also can be used in vector database for LLMs. | ||
|
||
`bge` is short for `BAAI general embedding`. | ||
|
||
| Model | Language | Description | query instruction for retrieval\* | | ||
|:-------------------------------|:--------:| :--------:| :--------:| | ||
| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | rank **2nd** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` | | ||
| [BAAI/bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) | Chinese | This model is trained without instruction, and rank **2nd** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | | | ||
| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | a base-scale model but has similar ability with `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` | | ||
| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` | | ||
|
||
## Predicted Entities | ||
|
||
|
||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_en_5.2.1_3.0_1704107443716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_en_5.2.1_3.0_1704107443716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
document = DocumentAssembler()\ | ||
.setInputCol("text")\ | ||
.setOutputCol("document") | ||
|
||
embeddings = BGEEmbeddings.pretrained("bge_base", "en")\ | ||
.setInputCols("document")\ | ||
.setOutputCol("embeddings") | ||
``` | ||
```scala | ||
val document = new DocumentAssembler() | ||
.setInputCol("text") | ||
.setOutputCol("document") | ||
|
||
val embeddings = BGEEmbeddings.pretrained("bge_base", "en") | ||
.setInputCols("document") | ||
.setOutputCol("embeddings") | ||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|bge_base| | ||
|Compatibility:|Spark NLP 5.2.1+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Input Labels:|[document]| | ||
|Output Labels:|[bge]| | ||
|Language:|en| | ||
|Size:|258.7 MB| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
layout: model | ||
title: BAAI general embedding English (bge_large) | ||
author: John Snow Labs | ||
name: bge_large | ||
date: 2024-01-01 | ||
tags: [en, onnx, bert, bge, open_source] | ||
task: Embeddings | ||
language: en | ||
edition: Spark NLP 5.2.1 | ||
spark_version: 3.0 | ||
supported: true | ||
engine: onnx | ||
annotator: BGEEmbeddings | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. | ||
And it also can be used in vector database for LLMs. | ||
|
||
`bge` is short for `BAAI general embedding`. | ||
|
||
| Model | Language | Description | query instruction for retrieval\* | | ||
|:-------------------------------|:--------:| :--------:| :--------:| | ||
| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | rank **2nd** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` | | ||
| [BAAI/bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) | Chinese | This model is trained without instruction, and rank **2nd** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | | | ||
| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | a base-scale model but has similar ability with `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` | | ||
| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` | | ||
|
||
## Predicted Entities | ||
|
||
|
||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_en_5.2.1_3.0_1704108288598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_en_5.2.1_3.0_1704108288598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
document = DocumentAssembler()\ | ||
.setInputCol("text")\ | ||
.setOutputCol("document") | ||
|
||
|
||
embeddings = BGEEmbeddings.pretrained("bge_large", "en")\ | ||
.setInputCols("document")\ | ||
.setOutputCol("embeddings") | ||
``` | ||
```scala | ||
val document = new DocumentAssembler() | ||
.setInputCol("text") | ||
.setOutputCol("document") | ||
|
||
|
||
val embeddings = BGEEmbeddings.pretrained("bge_large", "en") | ||
.setInputCols("document") | ||
.setOutputCol("embeddings") | ||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|bge_large| | ||
|Compatibility:|Spark NLP 5.2.1+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Input Labels:|[document]| | ||
|Output Labels:|[bge]| | ||
|Language:|en| | ||
|Size:|794.1 MB| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
--- | ||
layout: model | ||
title: BAAI general embedding English (bge_small) | ||
author: John Snow Labs | ||
name: bge_small | ||
date: 2024-01-01 | ||
tags: [onnx, bert, bge, en, open_source] | ||
task: Embeddings | ||
language: en | ||
edition: Spark NLP 5.2.1 | ||
spark_version: 3.0 | ||
supported: true | ||
engine: onnx | ||
annotator: BGEEmbeddings | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. | ||
And it also can be used in vector database for LLMs. | ||
|
||
`bge` is short for `BAAI general embedding`. | ||
|
||
| Model | Language | Description | query instruction for retrieval\* | | ||
|:-------------------------------|:--------:| :--------:| :--------:| | ||
| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | rank **2nd** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` | | ||
| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` | | ||
| [BAAI/bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) | Chinese | This model is trained without instruction, and rank **2nd** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | | | ||
| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | a base-scale model but has similar ability with `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` | | ||
| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` | | ||
|
||
## Predicted Entities | ||
|
||
|
||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_en_5.2.1_3.0_1704105455110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_en_5.2.1_3.0_1704105455110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
document = DocumentAssembler()\ | ||
.setInputCol("text")\ | ||
.setOutputCol("document") | ||
|
||
embeddings = BGEEmbeddings.pretrained("bge_small", "en")\ | ||
.setInputCols("document")\ | ||
.setOutputCol("embeddings") | ||
``` | ||
```scala | ||
val document = new DocumentAssembler() | ||
.setInputCol("text") | ||
.setOutputCol("document") | ||
|
||
val embeddings = BGEEmbeddings.pretrained("bge_small", "en") | ||
.setInputCols("document") | ||
.setOutputCol("embeddings") | ||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|bge_small| | ||
|Compatibility:|Spark NLP 5.2.1+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Input Labels:|[document]| | ||
|Output Labels:|[bge]| | ||
|Language:|en| | ||
|Size:|79.8 MB| |