Streamline translation use case #1

jmartin-tech · 2025-02-13T14:50:34Z

This revision enables translation support for a streamlined the use case. Some items such
as support for multiple target languages in a single run have been omitted in favor of a
more straight forward user experience and reduced ambiguity in the scope of results.

Translator classes have been implemented using the Configurable pattern and the plugin
loader. This introduced a new paradigm of providing configuration for a list of instances
with specific configuration required at runtime where previous Configurable class
configuration has been for all instances of a specific class or module. The processing and
attribute names used to create this instance list may evolve further.

Usage

Translation function is configured in the run section of a configuration see the doc
page in the PR for details.

New default configuration values for run.lang_spec and run.translators are in the
updated documentation and allow for backwards compatible configuration with existing runs.

There are still some existing TODO: comments and notes about location that may need
further testing before landing this upstream. Most noteworthy are comments still in the
code of the atkgen probe that require further scrutiny to validate the attack technique
is applied correctly.

It may be appropriate to gate this functionality as experimental for initial release, this
would required some additional guard code to ensure limited impact to report formats and
internal state.

Example

python -m garak -m huggingface.Model --config hf_RigoChat_gpu.yml -p lmrc --report_prefix RigoChat-21fde039

hf_RigoChat_gpu.yml:

run:
  lang_spec: "es"
  translators:
    - language: es-en
      model_type: local
      model_name: facebook/m2m100_418M
      hf_args:
        device: cuda
    - language: en-es
      model_type: local
      model_name: facebook/m2m100_418M
      hf_args:
        device: cuda
plugins:
  generators:
    huggingface:
      Model:
        name: IIC/RigoChat-7b-v2
        hf_args:
          device: cuda
          trust_remote_code: true
          torch_dtype: float16

Revisions detail

refactor for encapsulation
set default config entries for early load support
- adds docs for new entries
- initializes lang_spec to en
- intializes translators to an empty list
revised test config formats
only maintain prompts in one language
limit to only translated prompts
treat "$" as a special line bypassing translation attempt
NLI detector should support None response
check translation required carefully
- strings that do not contain "words" should bypass translation
- tests of translation configuration require lang_spec
- add remote translator specific class to tests
clarify base model prefix as opus-mt-*
detectors need translators in config
always return a translator even if just target to target
- always have a translator
- only attempt to translate output that is not None
force garbage collection after translator tests
validate probe trigger type during tranlation
translation needs lists of strings
- support for nested lists is added for existing probes content
remove direct _config access in plugins
- remove access to _config.run from probe classes
- adjust goodside translations to not retain original prompts
refactor probe translation tests for unit testing
- In the interest of reasonable execution time test probe call translation instead of executing translation.
- probe translation tests as unit testing only
Translation actions are tested with there own tests.
remove side-effects for internal translation methods
latentinjection init adjustment
- Remove extra call for translator
- Ensure _build_prompts_triggers is called only once during init for all implemented classes.
bugfix - goodside instance instead of class attributes
remote test case corrections
extract translator base config restrictions
- ENV var needs are handled by remote module
- adjust docs for each class
- match extending class method signature
consolidate nltk overrides in resources.api
remove no longer used "only_translate_word"
remove lang_list references, support a single target language
use pythonic code-style, adjust inline comments
refactor report file to rely on global fixture
rename base class to Translator
- rename SimpleTranslator to Translator
- source and target language determined via translator held values
update translation configuration docs

* refactor for encapsulation * set default config entries for early load support * adds docs for new entries * initializes `lang_spec` to `en` * intializes `translators` to an empty list * revised test config formats * only maintain prompts in one language * limit to only translated prompts * treat "$" as a special line bypassing translation attempt * NLI detector should support None response * check translation required carefully * strings that do not contain "words" should bypass translation * tests of translation configuration require `lang_spec` * add remote translator specific class to tests * clarify base model prefix as opus-mt-* * detectors need translators in config * always return a translator even if just target to target * always have a translator * only attempt to translate output that is not None * force garbage collection after translator tests * validate probe trigger type during tranlation * translation needs lists of strings * support for nested lists is added for existing probes content * remove direct _config access in plugins * remove access to _config.run from `probe` classes * adjust goodside translations to not retain original prompts * refactor probe translation tests for unit testing * In the interest of reasonable execution time test probe call translation instead of executing translation. * probe translation tests as unit testing only * Translation actions are tested with there own tests. * remove side-effects for internal translation methods * latentinjection init adjustment * Remove extra call for translator * Ensure `_build_prompts_triggers` is called only once during init for all implemented classes. * bugfix - goodside instance instead of class attributes * remote test case corrections * extract translator base config restrictions * ENV var needs are handled by `remote` module * adjust docs for each class * match extending class method signature * consolidate nltk overrides in resources.api * remove no longer used "only_translate_word" * remove lang_list references, support a single target language * use pythonic code-style, adjust inline comments * refactor report file to rely on global fixture * rename base class to `Translator` * rename `SimpleTranslator` to `Translator` * source and target language determined via translator held values * update translation configuration docs Signed-off-by: Jeffrey Martin <[email protected]>

jmartin-tech force-pushed the feature/multilingual-translation branch from 60d4daf to e5a08c7 Compare February 13, 2025 15:37

SnowMasaya merged commit 6780578 into SnowMasaya:feature/multilingual Feb 14, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline translation use case #1

Streamline translation use case #1

jmartin-tech commented Feb 13, 2025 •

edited

Loading

Streamline translation use case #1

Streamline translation use case #1

Conversation

jmartin-tech commented Feb 13, 2025 • edited Loading

Usage

Example

Revisions detail

jmartin-tech commented Feb 13, 2025 •

edited

Loading