The llmtranslate
library is a Python tool that leverages LangChain's ChatModels 🦜🔗 to translate text between languages and recognize the language of a given text. This library provides both synchronous and asynchronous translators to accommodate various use cases.
📖 Documentation: llm-translate.com
To ensure proper dependency management, install the required libraries in the following order:
-
Install LangChain chat libraries for your preferred LLM. For example:
pip install langchain-openai
For other LLM integrations, refer to LangChain ChatModels documentation.
-
Install the
llmtranslate
library:pip install llmtranslate
If you’re using Poetry, you must set a Python requirement capped below 4.0 to match langchain-openai (which requires <4.0). So, if your pyproject.toml has:
python = ">=3.12"
replace it with:
python = "^3.12"
or:
python = ">=3.12,<4.0"
Otherwise, Poetry sees Python 4.x as valid, which conflicts with langchain-openai’s <4.0 requirement.
from llmtranslate import Translator
from langchain_openai import ChatOpenAI
# Initialize the LLM and Translator
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")
translator = Translator(llm=llm)
# Detect the language of the text
text_language = translator.get_text_language("Hi how are you?")
if text_language:
print(text_language.ISO_639_1_code) # Output: en
print(text_language.ISO_639_2_code) # Output: eng
print(text_language.ISO_639_3_code) # Output: eng
print(text_language.language_name) # Output: English
# Translate text
translated_text = translator.translate("Bonjour tout le monde", "English")
print(translated_text) # Output: Hello everyone
This example demonstrates how to use the AsyncTranslator
class to asynchronously detect the language of a text and translate it into another language. The AsyncTranslator
is designed to work with asynchronous operations by splitting the input text into manageable chunks and by handling multiple concurrent requests to your language model.
max_translation_chunk_length
: Sets the maximum length for each text chunk that is translated.max_translation_chunk_length_multilang
: Sets the maximum length for text chunks when dealing with multiple languages.max_concurrent_llm_calls
: Limits the number of concurrent calls made to the underlying language model.max_concurrent_time_period
: (New) Limits the maximum number of concurrent LLM calls within a specific time period (in seconds). For instance, if you want to allow up to 50 requests per minute, set:max_concurrent_llm_calls
to 50max_concurrent_time_period
to 60
If you only need to limit the number of concurrent calls without a time constraint, you can leave max_concurrent_time_period
at its default value (typically 0
).
import asyncio
from llmtranslate import AsyncTranslator
from langchain_openai import ChatOpenAI
# Initialize the language model (LLM) with your API key
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")
async def translate_text():
# Initialize AsyncTranslator with appropriate parameters
translator = AsyncTranslator(
llm=llm,
max_translation_chunk_length=100, # Maximum chunk length for translation
max_translation_chunk_length_multilang=50, # Maximum chunk length for multi-language texts
max_concurrent_llm_calls=10, # Maximum number of concurrent LLM calls
# To enforce a rate limit (e.g., 50 calls per minute), uncomment and adjust the following:
# max_concurrent_llm_calls=50,
# max_concurrent_time_period=60,
)
# Prepare tasks: one to detect language and one to translate the text
tasks = [
translator.get_text_language("Hi how are you?"),
translator.translate("Hi how are you?", "Spanish")
]
# Run the tasks concurrently
results = await asyncio.gather(*tasks)
# Retrieve and display detected language information
text_language = results[0]
if text_language:
print(text_language.ISO_639_1_code) # Expected Output: en
print(text_language.ISO_639_2_code) # Expected Output: eng
print(text_language.ISO_639_3_code) # Expected Output: eng
print(text_language.language_name) # Expected Output: English
# Display the translated text
print(results[1]) # Expected Output: Hola, ¿cómo estás?
# Run the asynchronous translation process
asyncio.run(translate_text())
- Asynchronous Execution: Uses
asyncio
to concurrently run language detection and translation tasks. - Language Detection: The
get_text_language
function returns language details (e.g., ISO 639-1, ISO 639-2, ISO 639-3 codes and language name). - Translation: The
translate
function converts the input text to the desired target language. - Concurrency Control:
- Without Time Limit: Simply set
max_concurrent_llm_calls
to limit concurrent calls. - With Time Limit: Specify both
max_concurrent_llm_calls
andmax_concurrent_time_period
to control the number of calls within a specific time window (e.g., 50 calls per 60 seconds).
- Without Time Limit: Simply set
This example should help you understand how to integrate and configure all available parameters in the AsyncTranslator
, including the new time-based rate limiting feature.
- Description: Defines the maximum length (in characters) of a text chunk to be translated in a single call when the text is in one language.
- Recommendation: If translations are not accurate, try reducing this value as weaker LLMs struggle with large chunks of text.
- Description: Defines the maximum length (in characters) of a text chunk when the text contains multiple languages.
- Recommendation: Reduce this value for better accuracy with multi-language inputs.
- Description: Limits the number of concurrent calls made to the LLM.
- Default: 10
- Usage: Adjust this parameter to align with your LLM provider's concurrency limits.
You can find more examples of LangChain Chat models in the documentation at:
To create an instance of an Anthropic-based Chat LLM:
pip install langchain-anthropic
pip install llmtranslate
from langchain_anthropic import Anthropic
from llmtranslate import Translator
llm = Anthropic(model="claude-2", anthropic_api_key="your_anthropic_api_key")
translator = Translator(llm=llm)
To create an instance of a Chat LLM using the Mistral API:
pip install -qU langchain_mistralai
pip install llmtranslate
from langchain_mistral import MistralChat
from llmtranslate import Translator
llm = MistralChat(api_key="your_mistral_api_key")
translator = Translator(llm=llm)
llmtranslate supports all languages supported by GPT-4o. For a complete list of language codes, you can visit the ISO 639-1 website.
Here is a table showing which languages are supported by gpt-4o and gpt4o-mini:
Language Name | Language Code | Supported by gpt-4o | Supported by gpt4o-mini |
---|---|---|---|
English | en | Yes | Yes |
Mandarin Chinese | zh | Yes | Yes |
Hindi | hi | Yes | Yes |
Spanish | es | Yes | Yes |
French | fr | Yes | Yes |
German | de | Yes | Yes |
Russian | ru | Yes | Yes |
Arabic | ar | Yes | Yes |
Italian | it | Yes | Yes |
Korean | ko | Yes | Yes |
Punjabi | pa | Yes | Yes |
Bengali | bn | Yes | Yes |
Portuguese | pt | Yes | Yes |
Indonesian | id | Yes | Yes |
Urdu | ur | Yes | Yes |
Persian (Farsi) | fa | Yes | Yes |
Vietnamese | vi | Yes | Yes |
Polish | pl | Yes | Yes |
Samoan | sm | Yes | Yes |
Thai | th | Yes | Yes |
Ukrainian | uk | Yes | Yes |
Turkish | tr | Yes | Yes |
Maori | mi | No | No |
Norwegian | no | Yes | Yes |
Dutch | nl | Yes | Yes |
Greek | el | Yes | Yes |
Romanian | ro | Yes | Yes |
Swahili | sw | Yes | Yes |
Hungarian | hu | Yes | Yes |
Hebrew | he | Yes | Yes |
Swedish | sv | Yes | Yes |
Czech | cs | Yes | Yes |
Finnish | fi | Yes | Yes |
Amharic | am | No | No |
Tagalog | tl | Yes | Yes |
Burmese | my | Yes | Yes |
Tamil | ta | Yes | Yes |
Kannada | kn | Yes | Yes |
Pashto | ps | Yes | Yes |
Yoruba | yo | Yes | Yes |
Malay | ms | Yes | Yes |
Haitian Creole | ht | Yes | Yes |
Nepali | ne | Yes | Yes |
Sinhala | si | Yes | Yes |
Catalan | ca | Yes | Yes |
Malagasy | mg | Yes | Yes |
Latvian | lv | Yes | Yes |
Lithuanian | lt | Yes | Yes |
Estonian | et | Yes | Yes |
Somali | so | Yes | Yes |
Tigrinya | ti | No | No |
Breton | br | No | No |
Fijian | fj | Yes | No |
Maltese | mt | Yes | Yes |
Corsican | co | Yes | Yes |
Luxembourgish | lb | Yes | Yes |
Occitan | oc | Yes | Yes |
Welsh | cy | Yes | Yes |
Albanian | sq | Yes | Yes |
Macedonian | mk | Yes | Yes |
Icelandic | is | Yes | Yes |
Slovenian | sl | Yes | Yes |
Galician | gl | Yes | Yes |
Basque | eu | Yes | Yes |
Azerbaijani | az | Yes | Yes |
Uzbek | uz | Yes | Yes |
Kazakh | kk | Yes | Yes |
Mongolian | mn | Yes | Yes |
Tibetan | bo | No | No |
Khmer | km | Yes | No |
Lao | lo | Yes | Yes |
Telugu | te | Yes | Yes |
Marathi | mr | Yes | Yes |
Chichewa | ny | Yes | Yes |
Esperanto | eo | Yes | Yes |
Kurdish | ku | No | No |
Tajik | tg | Yes | Yes |
Xhosa | xh | Yes | No |
Yiddish | yi | Yes | Yes |
Zulu | zu | Yes | Yes |
Sundanese | su | Yes | Yes |
Tatar | tt | Yes | Yes |
Quechua | qu | No | No |
Uighur | ug | No | No |
Wolof | wo | No | No |
Tswana | tn | Yes | Yes |
llmtranslate is licensed under the MIT License. See the LICENSE file for more details.