Skip to content

A Python library for language detection and translation using langchain ChatModels

License

Notifications You must be signed in to change notification settings

adam-pawelek/llmtranslate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llmtranslate

Test Python package - Publish Docs PyPI version License: MIT codecov Downloads

The llmtranslate library is a Python tool that leverages LangChain's ChatModels 🦜🔗 to translate text between languages and recognize the language of a given text. This library provides both synchronous and asynchronous translators to accommodate various use cases.

📖 Documentation: llm-translate.com

Installation

To ensure proper dependency management, install the required libraries in the following order:

  1. Install LangChain chat libraries for your preferred LLM. For example:

    pip install langchain-openai

    For other LLM integrations, refer to LangChain ChatModels documentation.

  2. Install the llmtranslate library:

    pip install llmtranslate

Poetry problem

If you’re using Poetry, you must set a Python requirement capped below 4.0 to match langchain-openai (which requires <4.0). So, if your pyproject.toml has:

python = ">=3.12"

replace it with:

python = "^3.12"

or:

python = ">=3.12,<4.0"

Otherwise, Poetry sees Python 4.x as valid, which conflicts with langchain-openai’s <4.0 requirement.


Example Using OpenAI's GPT

Using Translator (Synchronous)

from llmtranslate import Translator
from langchain_openai import ChatOpenAI

# Initialize the LLM and Translator
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")
translator = Translator(llm=llm)

# Detect the language of the text
text_language = translator.get_text_language("Hi how are you?")
if text_language:
    print(text_language.ISO_639_1_code) # Output: en
    print(text_language.ISO_639_2_code) # Output: eng
    print(text_language.ISO_639_3_code) # Output: eng
    print(text_language.language_name) # Output: English

# Translate text
translated_text = translator.translate("Bonjour tout le monde", "English")
print(translated_text)  # Output: Hello everyone

Using AsyncTranslator (Asynchronous)

This example demonstrates how to use the AsyncTranslator class to asynchronously detect the language of a text and translate it into another language. The AsyncTranslator is designed to work with asynchronous operations by splitting the input text into manageable chunks and by handling multiple concurrent requests to your language model.

Key Configuration Options

  • max_translation_chunk_length: Sets the maximum length for each text chunk that is translated.
  • max_translation_chunk_length_multilang: Sets the maximum length for text chunks when dealing with multiple languages.
  • max_concurrent_llm_calls: Limits the number of concurrent calls made to the underlying language model.
  • max_concurrent_time_period: (New) Limits the maximum number of concurrent LLM calls within a specific time period (in seconds). For instance, if you want to allow up to 50 requests per minute, set:
    • max_concurrent_llm_calls to 50
    • max_concurrent_time_period to 60

If you only need to limit the number of concurrent calls without a time constraint, you can leave max_concurrent_time_period at its default value (typically 0).

Code Example

import asyncio
from llmtranslate import AsyncTranslator
from langchain_openai import ChatOpenAI

# Initialize the language model (LLM) with your API key
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")

async def translate_text():
    # Initialize AsyncTranslator with appropriate parameters
    translator = AsyncTranslator(
        llm=llm,
        max_translation_chunk_length=100,           # Maximum chunk length for translation
        max_translation_chunk_length_multilang=50,    # Maximum chunk length for multi-language texts
        max_concurrent_llm_calls=10,                  # Maximum number of concurrent LLM calls
        # To enforce a rate limit (e.g., 50 calls per minute), uncomment and adjust the following:
        # max_concurrent_llm_calls=50,
        # max_concurrent_time_period=60,
    )
    
    # Prepare tasks: one to detect language and one to translate the text
    tasks = [
        translator.get_text_language("Hi how are you?"),
        translator.translate("Hi how are you?", "Spanish")
    ]
    
    # Run the tasks concurrently
    results = await asyncio.gather(*tasks)
    
    # Retrieve and display detected language information
    text_language = results[0]
    if text_language:
        print(text_language.ISO_639_1_code)  # Expected Output: en
        print(text_language.ISO_639_2_code)  # Expected Output: eng
        print(text_language.ISO_639_3_code)  # Expected Output: eng
        print(text_language.language_name)   # Expected Output: English

    # Display the translated text
    print(results[1])  # Expected Output: Hola, ¿cómo estás?

# Run the asynchronous translation process
asyncio.run(translate_text())

Summary

  • Asynchronous Execution: Uses asyncio to concurrently run language detection and translation tasks.
  • Language Detection: The get_text_language function returns language details (e.g., ISO 639-1, ISO 639-2, ISO 639-3 codes and language name).
  • Translation: The translate function converts the input text to the desired target language.
  • Concurrency Control:
    • Without Time Limit: Simply set max_concurrent_llm_calls to limit concurrent calls.
    • With Time Limit: Specify both max_concurrent_llm_calls and max_concurrent_time_period to control the number of calls within a specific time window (e.g., 50 calls per 60 seconds).

This example should help you understand how to integrate and configure all available parameters in the AsyncTranslator, including the new time-based rate limiting feature.


Key Parameters

max_translation_chunk_length

  • Description: Defines the maximum length (in characters) of a text chunk to be translated in a single call when the text is in one language.
  • Recommendation: If translations are not accurate, try reducing this value as weaker LLMs struggle with large chunks of text.

max_translation_chunk_length_multilang

  • Description: Defines the maximum length (in characters) of a text chunk when the text contains multiple languages.
  • Recommendation: Reduce this value for better accuracy with multi-language inputs.

max_concurrent_llm_calls

  • Description: Limits the number of concurrent calls made to the LLM.
  • Default: 10
  • Usage: Adjust this parameter to align with your LLM provider's concurrency limits.

Other Supported LLMs Examples

You can find more examples of LangChain Chat models in the documentation at:

Anthropic's Claude

To create an instance of an Anthropic-based Chat LLM:

  pip install langchain-anthropic
  pip install llmtranslate
from langchain_anthropic import Anthropic
from llmtranslate import Translator
llm = Anthropic(model="claude-2", anthropic_api_key="your_anthropic_api_key")
translator = Translator(llm=llm)

Mistral via API

To create an instance of a Chat LLM using the Mistral API:

pip install -qU langchain_mistralai
pip install llmtranslate
from langchain_mistral import MistralChat
from llmtranslate import Translator

llm = MistralChat(api_key="your_mistral_api_key")
translator = Translator(llm=llm)

Supported Languages

llmtranslate supports all languages supported by GPT-4o. For a complete list of language codes, you can visit the ISO 639-1 website.

Here is a table showing which languages are supported by gpt-4o and gpt4o-mini:

Language Name Language Code Supported by gpt-4o Supported by gpt4o-mini
English en Yes Yes
Mandarin Chinese zh Yes Yes
Hindi hi Yes Yes
Spanish es Yes Yes
French fr Yes Yes
German de Yes Yes
Russian ru Yes Yes
Arabic ar Yes Yes
Italian it Yes Yes
Korean ko Yes Yes
Punjabi pa Yes Yes
Bengali bn Yes Yes
Portuguese pt Yes Yes
Indonesian id Yes Yes
Urdu ur Yes Yes
Persian (Farsi) fa Yes Yes
Vietnamese vi Yes Yes
Polish pl Yes Yes
Samoan sm Yes Yes
Thai th Yes Yes
Ukrainian uk Yes Yes
Turkish tr Yes Yes
Maori mi No No
Norwegian no Yes Yes
Dutch nl Yes Yes
Greek el Yes Yes
Romanian ro Yes Yes
Swahili sw Yes Yes
Hungarian hu Yes Yes
Hebrew he Yes Yes
Swedish sv Yes Yes
Czech cs Yes Yes
Finnish fi Yes Yes
Amharic am No No
Tagalog tl Yes Yes
Burmese my Yes Yes
Tamil ta Yes Yes
Kannada kn Yes Yes
Pashto ps Yes Yes
Yoruba yo Yes Yes
Malay ms Yes Yes
Haitian Creole ht Yes Yes
Nepali ne Yes Yes
Sinhala si Yes Yes
Catalan ca Yes Yes
Malagasy mg Yes Yes
Latvian lv Yes Yes
Lithuanian lt Yes Yes
Estonian et Yes Yes
Somali so Yes Yes
Tigrinya ti No No
Breton br No No
Fijian fj Yes No
Maltese mt Yes Yes
Corsican co Yes Yes
Luxembourgish lb Yes Yes
Occitan oc Yes Yes
Welsh cy Yes Yes
Albanian sq Yes Yes
Macedonian mk Yes Yes
Icelandic is Yes Yes
Slovenian sl Yes Yes
Galician gl Yes Yes
Basque eu Yes Yes
Azerbaijani az Yes Yes
Uzbek uz Yes Yes
Kazakh kk Yes Yes
Mongolian mn Yes Yes
Tibetan bo No No
Khmer km Yes No
Lao lo Yes Yes
Telugu te Yes Yes
Marathi mr Yes Yes
Chichewa ny Yes Yes
Esperanto eo Yes Yes
Kurdish ku No No
Tajik tg Yes Yes
Xhosa xh Yes No
Yiddish yi Yes Yes
Zulu zu Yes Yes
Sundanese su Yes Yes
Tatar tt Yes Yes
Quechua qu No No
Uighur ug No No
Wolof wo No No
Tswana tn Yes Yes

Additional Resources

Authors

License

llmtranslate is licensed under the MIT License. See the LICENSE file for more details.