Refactor `huggingface` config support #742

jmartin-tech · 2024-06-17T18:54:32Z

Provide huggingface Pipeline and LLaVA configuration values as hf_args that can be passed via the yaml or json generator configuration paths.

do not override config deprefix_prompt
improve code reuse
- consolidate __init__ where possible
- shift generator or model object creation to _load_client()
- Wrap errors as GarakException vs Exception
crude implementation of limitation on parallel generator call
add torch mps support
- detect cuda vs mps vs cpu in a common way
- guard import of OptimimPipeline
enable hf model or pipeline config in hf_args
- support all generic pipeline args at all times
- adds do_sample when model is a parameter to the Callable
- adds low_cpu_mem_usage and all pipeline arguments for Callables without model
- consolidates optimal device selection & set when not provided by config

~~Note: While hugggingface.Model not extends Pipeline due to how the model is loaded the only hf_arg currently passed is hf_args["device"] more work is needed to support more~~

Testing of each class type is needed and should likely be baselined code prior to #711 as few recommended possible testing configurations used on various hardware:

python -m garak -m huggingface.Model -n meta-llama/Llama-2-7b-chat-hf --config fast -g 1
python -m garak --model_type huggingface --model_name gpt2 --config fast -g 1
python -m garak --model_type huggingface.LLaVA --model_name llava-hf/llava-v1.6-mistral-7b-hf -p visual_jailbreak.FigStepTiny

Example configuration yaml:

plugins:
  generators:
    huggingface:
      hf_args:
        torch_dtype: float16
      Pipeline:
        hf_args:
            device: cuda

When no device is specified by configuration garak will now select cuda / mps / cpu depending on environment availability.

Signed-off-by: Jeffrey Martin <[email protected]>

* consolidate `__init__` where possible * shift generator or model object creation to `_load_client()` Signed-off-by: Jeffrey Martin <[email protected]>

Signed-off-by: Jeffrey Martin <[email protected]>

* detect cuda vs mps vs cpu in a common way * guard import of OptimimPipeline Signed-off-by: Jeffrey Martin <[email protected]>

* support all generic `pipeline` args at all times * adds `do_sample` when `model` is a parameter to the `Callable` * adds `low_cpu_mem_usage` and all `pipeline` for `Callables` without `model` * consolidates optimal device selection & set when not provided by config Signed-off-by: Jeffrey Martin <[email protected]>

Signed-off-by: Jeffrey Martin <[email protected]>

erickgalinkin

Looks good overall -- left a few comments throughout. Great work!

erickgalinkin · 2024-06-18T14:31:36Z

garak/generators/huggingface.py

@@ -50,53 +52,134 @@ def _set_hf_context_len(self, config):
            if isinstance(config.n_ctx, int):
                self.context_len = config.n_ctx

+    def _gather_hf_params(self, hf_constructor: Callable):
+        # this may be a bit too naive as it will pass any parameter valid for the pipeline signature
+        # this falls over when passed `from_pretrained` methods as the callable model params are not explicit


Is it worth catching this and gathering those params differently?

This is mitigated by adding Pipeline params when model is not a parameter of the callable, future iterations may suggest adding a supported set of params so I left the notes here to offer details on why the code adds more parameters in some cases.

garak/generators/huggingface.py

Signed-off-by: Jeffrey Martin <[email protected]>

leondz

thank you. looking good. bunch of questions & clarification requests, mostly about making me smarter

leondz · 2024-06-19T05:52:21Z

garak/generators/base.py

@@ -27,6 +27,7 @@ class Generator(Configurable):

    active = True
    generator_family_name = None
+    parallel_capable = True


tagging #361

garak/generators/huggingface.py

leondz · 2024-06-19T06:02:42Z

garak/generators/huggingface.py

+        selected_device = None
+        if self.hf_args["device"] is not None:
+            if isinstance(self.hf_args["device"], int):
+                # this assumes that indexed only devices selections means `cuda`


tagging #671, #672

I believe this largely resolves #672.
#671 will require testing on an actual device, but specification of device="cuda" should be all that's necessary for ROCm with HF.

garak/generators/huggingface.py

leondz · 2024-06-19T06:23:33Z

tests/generators/test_huggingface.py

-    g = garak.generators.huggingface.Pipeline("gpt2")
+    gen_config = {
+        "huggingface": {
+            "Pipeline": {
+                "name": "gpt2",
+                "hf_args": {
+                    "device": "cpu",
+                },
+            }
+        }
+    }
+    config_root = GarakSubConfig()
+    setattr(config_root, "generators", gen_config)
+
+    g = garak.generators.huggingface.Pipeline("gpt2", config_root=config_root)


side note: this is a working but i guess relatively inelegant pattern for getting a generator up.

What happens if the config setting is skipped?

Is the name parameter in Pipeline's constructor redundant? How does it relate to the name in gen_config - which takes precedence?

why device cpu and not auto?

Good questions:

If config is skipped the new auto detection will load the most efficient device available. On macOS this turns out to be mps and actually causes this class to fail construction due to incomplete support for MPS in torch at this time.

name duplication is redundant, currently name from the constructor is held over name in config_root however either could be removed for this test.

See answer to 1, cpu is a guaranteed fully supported value.

OK, that's good to know.

I think we probably don't want to ship logic that breaks generators by default on macOS - did I read this right?

Sounds like there is some tidying up to do. I would love for the default config to run reasonably well out of the box, especially with Pipeline+gpt2, which is used in so many examples because it has a high chance of fitting in memory and going quickly on any machine. Requiring a setattr() gives an impression of "hotwiring". I'll think on this more - I haven't had time to play with the Feature: configurable plugins #711 config updates in depth much.

OK, that's cool :)

mps does not break on macOS by default as gpt2 does work, I have tested it will load and complete a fast config, however this test needs to be consistent for all runs so forcing cpu was a reasonable solution. This test in particular results in the error below when asking for generation on "" on mps, the error seemed like a clear message to the user since overriding the device can be a straight forward config setting now.

ERROR root:huggingface.py:203 Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device.

The setattr() here could be replaced with a simple dictionary as both are supported by config I just followed more closely what cli will generate for as that is the more specific object type.

config_root = { "generators": { "huggingface": { "Pipeline": { "name": "gpt2", "hf_args": { "device": "cpu", }, } } } }

Generations on empty strings frequently throw errors. Do we have a test for this? (I could easily verify this myself)

We don't test all generators for empty string behavior by default because they each their own unique special snowflake style (making it a schlep) and often require an API key (making it an expensive pain). I tried to write this one day and discovered it was slightly less feasible than imagined - the new config setup might make it easier now. There are a few tests in place, but it's incomplete.

leondz · 2024-06-19T06:24:40Z

tests/test_config.py

-                gpu: 1
+                hf_args:
+                    torch_dtype: float16
+                    device: cuda:1


how come gpu: 1 changed to cuda:1 here...

These are simply example format being documented in the test file. They are not actually consumed values in the tests.

Can we increase consistency to make the example more straightforward?

I would suggest the goal of the example is to show what you might configure differently depending on context.

The example here would define a model under test that is a huggingface.Pipeline would used any cuda devices while a huggingface generator created for use by the test.Blank detector would specify to use cuda:1 for it's device, and a huggingface generator created for use by the test.Blank buff would use device cuda:0. This would mean the model under tests is free to use any available device while the detector and buff would use specific different GPU devices from each other.

I think a more concrete or viable example might be on the horizo. This level of detail in config is likely rare for users to need and @erickgalinkin may be the only actual consumer for this level for the near term. 😄

This feels right -- I suspect the overwhelming majority of users will not need to configure things like this, but it's good to have. I'll likely need to refactor parts of AutoDAN and/or TAP as a result of this PR being merged and it's plausible that we'll need this level of granularity in configs, particularly for folks using resource-constrained systems.

leondz · 2024-06-19T06:25:01Z

tests/test_config.py

-                gpu: 1
+                hf_args:
+                    device: cuda:0


...and gpu: 1 changed to cuda:0 here?

gpu was a theoretical value device aligns with actual huggingface expectations.

got it. demonstrating variation is good. otoh I think the example could be confusing with gpu: 1 referring to both cuda: 0 and cuda: 1 within sight of each other.

The final result of the edits is removal of all gpu: entries and only provides huggingface examples using the device key.

tests/test_configurable.py

* raise error when passed negative device integer * rename parameter tracking var * remove unused import * add tests for `_select_hf_device()` Signed-off-by: Jeffrey Martin <[email protected]>

leondz

lgtm. thanks!

jmartin-tech added 5 commits June 17, 2024 13:07

do not override config deprefix_prompt

97c5916

Signed-off-by: Jeffrey Martin <[email protected]>

improve code reuse

62a2269

* consolidate `__init__` where possible * shift generator or model object creation to `_load_client()` Signed-off-by: Jeffrey Martin <[email protected]>

crude implmentation of limitation on parallel generator call

11ebb73

Signed-off-by: Jeffrey Martin <[email protected]>

add torch mps support & enabled passed pipeline params

fd06da1

* detect cuda vs mps vs cpu in a common way * guard import of OptimimPipeline Signed-off-by: Jeffrey Martin <[email protected]>

jmartin-tech requested review from leondz and erickgalinkin June 17, 2024 18:54

amend yaml config example

1fca119

jmartin-tech force-pushed the refactor/huggingface-config-support branch from 667fcc9 to 1fca119 Compare June 18, 2024 14:16

support merged dictionary in Configurable

f4d77b6

Signed-off-by: Jeffrey Martin <[email protected]>

jmartin-tech force-pushed the refactor/huggingface-config-support branch from c14fecd to f4d77b6 Compare June 18, 2024 14:48

erickgalinkin reviewed Jun 18, 2024

View reviewed changes

free tokenizer in _clear_client

9f19c30

Signed-off-by: Jeffrey Martin <[email protected]>

leondz reviewed Jun 19, 2024

View reviewed changes

explicit device support

7b13829

* raise error when passed negative device integer * rename parameter tracking var * remove unused import * add tests for `_select_hf_device()` Signed-off-by: Jeffrey Martin <[email protected]>

jmartin-tech requested a review from leondz June 26, 2024 15:36

leondz approved these changes Jun 27, 2024

View reviewed changes

jmartin-tech merged commit 21dd343 into NVIDIA:main Jun 27, 2024
4 checks passed

jmartin-tech deleted the refactor/huggingface-config-support branch June 27, 2024 13:22

github-actions bot locked and limited conversation to collaborators Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `huggingface` config support #742

Refactor `huggingface` config support #742

jmartin-tech commented Jun 17, 2024 •

edited

Loading

erickgalinkin left a comment

erickgalinkin Jun 18, 2024

jmartin-tech Jun 18, 2024

leondz left a comment

leondz Jun 19, 2024

leondz Jun 19, 2024

erickgalinkin Jun 24, 2024

leondz Jun 19, 2024

jmartin-tech Jun 19, 2024 •

edited

Loading

leondz Jun 19, 2024

jmartin-tech Jun 19, 2024

erickgalinkin Jun 24, 2024

leondz Jun 25, 2024

leondz Jun 19, 2024

jmartin-tech Jun 19, 2024

leondz Jun 19, 2024

jmartin-tech Jun 19, 2024 •

edited

Loading

erickgalinkin Jun 24, 2024

leondz Jun 19, 2024

jmartin-tech Jun 24, 2024

leondz Jun 25, 2024

jmartin-tech Jun 25, 2024

leondz left a comment

Refactor huggingface config support #742

Refactor huggingface config support #742

Conversation

jmartin-tech commented Jun 17, 2024 • edited Loading

erickgalinkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leondz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmartin-tech Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmartin-tech Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leondz left a comment

Choose a reason for hiding this comment

Refactor `huggingface` config support #742

Refactor `huggingface` config support #742

jmartin-tech commented Jun 17, 2024 •

edited

Loading

jmartin-tech Jun 19, 2024 •

edited

Loading

jmartin-tech Jun 19, 2024 •

edited

Loading