Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline translation use case #1

Conversation

jmartin-tech
Copy link

@jmartin-tech jmartin-tech commented Feb 13, 2025

This revision enables translation support for a streamlined the use case. Some items such
as support for multiple target languages in a single run have been omitted in favor of a
more straight forward user experience and reduced ambiguity in the scope of results.

Translator classes have been implemented using the Configurable pattern and the plugin
loader. This introduced a new paradigm of providing configuration for a list of instances
with specific configuration required at runtime where previous Configurable class
configuration has been for all instances of a specific class or module. The processing and
attribute names used to create this instance list may evolve further.

Usage

Translation function is configured in the run section of a configuration see the doc
page in the PR for details.

New default configuration values for run.lang_spec and run.translators are in the
updated documentation and allow for backwards compatible configuration with existing runs.

There are still some existing TODO: comments and notes about location that may need
further testing before landing this upstream. Most noteworthy are comments still in the
code of the atkgen probe that require further scrutiny to validate the attack technique
is applied correctly.

It may be appropriate to gate this functionality as experimental for initial release, this
would required some additional guard code to ensure limited impact to report formats and
internal state.

Example

python -m garak -m huggingface.Model --config hf_RigoChat_gpu.yml -p lmrc --report_prefix RigoChat-21fde039

hf_RigoChat_gpu.yml:

run:
  lang_spec: "es"
  translators:
    - language: es-en
      model_type: local
      model_name: facebook/m2m100_418M
      hf_args:
        device: cuda
    - language: en-es
      model_type: local
      model_name: facebook/m2m100_418M
      hf_args:
        device: cuda
plugins:
  generators:
    huggingface:
      Model:
        name: IIC/RigoChat-7b-v2
        hf_args:
          device: cuda
          trust_remote_code: true
          torch_dtype: float16

Revisions detail

  • refactor for encapsulation
  • set default config entries for early load support
    • adds docs for new entries
    • initializes lang_spec to en
    • intializes translators to an empty list
  • revised test config formats
  • only maintain prompts in one language
  • limit to only translated prompts
  • treat "$" as a special line bypassing translation attempt
  • NLI detector should support None response
  • check translation required carefully
    • strings that do not contain "words" should bypass translation
    • tests of translation configuration require lang_spec
    • add remote translator specific class to tests
  • clarify base model prefix as opus-mt-*
  • detectors need translators in config
  • always return a translator even if just target to target
    • always have a translator
    • only attempt to translate output that is not None
  • force garbage collection after translator tests
  • validate probe trigger type during tranlation
  • translation needs lists of strings
    • support for nested lists is added for existing probes content
  • remove direct _config access in plugins
    • remove access to _config.run from probe classes
    • adjust goodside translations to not retain original prompts
  • refactor probe translation tests for unit testing
    • In the interest of reasonable execution time test probe call translation instead of executing translation.
    • probe translation tests as unit testing only
  • Translation actions are tested with there own tests.
  • remove side-effects for internal translation methods
  • latentinjection init adjustment
    • Remove extra call for translator
    • Ensure _build_prompts_triggers is called only once during init for all implemented classes.
  • bugfix - goodside instance instead of class attributes
  • remote test case corrections
  • extract translator base config restrictions
    • ENV var needs are handled by remote module
    • adjust docs for each class
    • match extending class method signature
  • consolidate nltk overrides in resources.api
  • remove no longer used "only_translate_word"
  • remove lang_list references, support a single target language
  • use pythonic code-style, adjust inline comments
  • refactor report file to rely on global fixture
  • rename base class to Translator
    • rename SimpleTranslator to Translator
    • source and target language determined via translator held values
  • update translation configuration docs

* refactor for encapsulation
* set default config entries for early load support
  * adds docs for new entries
  * initializes `lang_spec` to `en`
  * intializes `translators` to an empty list
* revised test config formats
* only maintain prompts in one language
* limit to only translated prompts
* treat "$" as a special line bypassing translation attempt
* NLI detector should support None response
* check translation required carefully
  * strings that do not contain "words" should bypass translation
  * tests of translation configuration require `lang_spec`
  * add remote translator specific class to tests
* clarify base model prefix as opus-mt-*
* detectors need translators in config
* always return a translator even if just target to target
  * always have a translator
  * only attempt to translate output that is not None
* force garbage collection after translator tests
* validate probe trigger type during tranlation
* translation needs lists of strings
  * support for nested lists is added for existing probes content
* remove direct _config access in plugins
  * remove access to _config.run from `probe` classes
  * adjust goodside translations to not retain original prompts
* refactor probe translation tests for unit testing
  * In the interest of reasonable execution time test probe
    call translation instead of executing translation.
  * probe translation tests as unit testing only
* Translation actions are tested with there own tests.
* remove side-effects for internal translation methods
* latentinjection init adjustment
  * Remove extra call for translator
  * Ensure `_build_prompts_triggers` is called only once during init
    for all implemented classes.
* bugfix - goodside instance instead of class attributes
* remote test case corrections
* extract translator base config restrictions
  * ENV var needs are handled by `remote` module
  * adjust docs for each class
  * match extending class method signature
* consolidate nltk overrides in resources.api
* remove no longer used "only_translate_word"
* remove lang_list references, support a single target language
* use pythonic code-style, adjust inline comments
* refactor report file to rely on global fixture
* rename base class to `Translator`
  * rename `SimpleTranslator` to `Translator`
  * source and target language determined via translator held values
* update translation configuration docs

Signed-off-by: Jeffrey Martin <[email protected]>
@jmartin-tech jmartin-tech force-pushed the feature/multilingual-translation branch from 60d4daf to e5a08c7 Compare February 13, 2025 15:37
@SnowMasaya SnowMasaya merged commit 6780578 into SnowMasaya:feature/multilingual Feb 14, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants