Skip to content

Commit

Permalink
Building new gigachain version
Browse files Browse the repository at this point in the history
  • Loading branch information
Konstantin Krestnikov authored and Konstantin Krestnikov committed Jan 9, 2024
1 parent 225ec33 commit b425e53
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 7 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,15 +224,15 @@ chat.get_num_tokens("Сколько токенов в этой строке")
Сейчас генерация эмбеддингов с помощью GigaChat недоступна.
В качестве временного решения вы можете использовать любые доступные эмбеддинги, например, OpenAIEmbeddings.

Вы также можете [использовать локальные эмбеддинги](https://github.com/ai-forever/gigachain/blob/master/docs/docs/modules/chains/how_to/retrieve.ipynb).
Вы также можете [использовать локальные эмбеддинги](docs/docs/modules/chains/how_to/retrieve.ipynb).

## Коллекция примеров

Ниже представлен список примеров использования GigaChain.

### Базовые примеры работы с GigaChat

- [Ответы на вопросы по статьям из Wikipedia](docs/docs/integrations/retrievers/wikipedia.ipynb)
- [Ответы на вопросы по документу на примере "разговор с книгой" (RAG)](docs/docs/use_cases/question_answering/gigachat_qa.ipynb)
- [Суммаризация по алгоритму MapReduce](docs/extras/use_cases/summarization.ipynb) (см. раздел map/reduce)
- [Работа с хабом промптов, цепочками и парсером JSON](docs/docs/modules/model_io/output_parsers/json.ipynb)
- [Парсинг списков, содержащихся в ответе](docs/docs/modules/model_io/output_parsers/list.ipynb)
Expand Down
30 changes: 26 additions & 4 deletions libs/community/langchain_community/embeddings/gigachat.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, root_validator

from tqdm import tqdm
import time

logger = logging.getLogger(__name__)


Expand All @@ -22,6 +25,9 @@ class GigaChatEmbeddings(BaseModel, Embeddings):
embeddings = GigaChatEmbeddings(credentials=..., verify_ssl_certs=False)
"""

one_by_one_mode: bool = True
""" Send texts one-by-one to server (to increse token limit)"""

Check failure on line 29 in libs/community/langchain_community/embeddings/gigachat.py

View workflow job for this annotation

GitHub Actions / Check for spelling errors

increse ==> increase

base_url: Optional[str] = None
""" Base API URL """
auth_url: Optional[str] = None
Expand All @@ -46,6 +52,9 @@ class GigaChatEmbeddings(BaseModel, Embeddings):
verify_ssl_certs: Optional[bool] = None
""" Check certificates for all requests """

_debug_delay: float = 0
""" Debug timeout for limit rps to server"""

ca_bundle_file: Optional[str] = None
cert_file: Optional[str] = None
key_file: Optional[str] = None
Expand Down Expand Up @@ -108,10 +117,23 @@ def embed_documents(
Returns:
List of embeddings, one for each text.
"""
return [
embedding.embedding
for embedding in self._client.embeddings(texts=texts, model=model).data
]
if self.one_by_one_mode:
result: List[List[float]] = []
if self._debug_delay == 0:
for text in texts:
for embedding in self._client.embeddings(texts=[text], model=model).data:
result.append(embedding.embedding)
else:
for text in texts:
time.sleep(self._debug_delay)
for embedding in tqdm(self._client.embeddings(texts=[text], model=model).data):
result.append(embedding.embedding)
return result
else:
return [
embedding.embedding
for embedding in self._client.embeddings(texts=texts, model=model).data
]

# return _embed_with_retry(self, texts=texts)

Expand Down
2 changes: 1 addition & 1 deletion libs/community/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "gigachain-community"
version = "0.0.6"
version = "0.0.6.1"
description = "Community contributed LangChain integrations."
authors = []
license = "MIT"
Expand Down

0 comments on commit b425e53

Please sign in to comment.