-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor huggingface
config support
#742
Refactor huggingface
config support
#742
Conversation
Signed-off-by: Jeffrey Martin <[email protected]>
* consolidate `__init__` where possible * shift generator or model object creation to `_load_client()` Signed-off-by: Jeffrey Martin <[email protected]>
Signed-off-by: Jeffrey Martin <[email protected]>
* detect cuda vs mps vs cpu in a common way * guard import of OptimimPipeline Signed-off-by: Jeffrey Martin <[email protected]>
* support all generic `pipeline` args at all times * adds `do_sample` when `model` is a parameter to the `Callable` * adds `low_cpu_mem_usage` and all `pipeline` for `Callables` without `model` * consolidates optimal device selection & set when not provided by config Signed-off-by: Jeffrey Martin <[email protected]>
667fcc9
to
1fca119
Compare
Signed-off-by: Jeffrey Martin <[email protected]>
c14fecd
to
f4d77b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall -- left a few comments throughout. Great work!
@@ -50,53 +52,134 @@ def _set_hf_context_len(self, config): | |||
if isinstance(config.n_ctx, int): | |||
self.context_len = config.n_ctx | |||
|
|||
def _gather_hf_params(self, hf_constructor: Callable): | |||
# this may be a bit too naive as it will pass any parameter valid for the pipeline signature | |||
# this falls over when passed `from_pretrained` methods as the callable model params are not explicit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth catching this and gathering those params differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mitigated by adding Pipeline
params when model
is not a parameter of the callable, future iterations may suggest adding a supported set of params so I left the notes here to offer details on why
the code adds more parameters in some cases.
Signed-off-by: Jeffrey Martin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you. looking good. bunch of questions & clarification requests, mostly about making me smarter
@@ -27,6 +27,7 @@ class Generator(Configurable): | |||
|
|||
active = True | |||
generator_family_name = None | |||
parallel_capable = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tagging #361
selected_device = None | ||
if self.hf_args["device"] is not None: | ||
if isinstance(self.hf_args["device"], int): | ||
# this assumes that indexed only devices selections means `cuda` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g = garak.generators.huggingface.Pipeline("gpt2") | ||
gen_config = { | ||
"huggingface": { | ||
"Pipeline": { | ||
"name": "gpt2", | ||
"hf_args": { | ||
"device": "cpu", | ||
}, | ||
} | ||
} | ||
} | ||
config_root = GarakSubConfig() | ||
setattr(config_root, "generators", gen_config) | ||
|
||
g = garak.generators.huggingface.Pipeline("gpt2", config_root=config_root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: this is a working but i guess relatively inelegant pattern for getting a generator up.
- What happens if the config setting is skipped?
- Is the name parameter in
Pipeline
's constructor redundant? How does it relate to thename
ingen_config
- which takes precedence? - why device
cpu
and notauto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good questions:
- If config is skipped the new auto detection will load the most efficient device available. On macOS this turns out to be
mps
and actually causes this class to fail construction due to incomplete support for MPS in torch at this time. name
duplication is redundant, currentlyname
from the constructor is held overname
inconfig_root
however either could be removed for this test.- See answer to
1
,cpu
is a guaranteed fully supported value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's good to know.
- I think we probably don't want to ship logic that breaks generators by default on macOS - did I read this right?
- Sounds like there is some tidying up to do. I would love for the default config to run reasonably well out of the box, especially with
Pipeline
+gpt2
, which is used in so many examples because it has a high chance of fitting in memory and going quickly on any machine. Requiring asetattr()
gives an impression of "hotwiring". I'll think on this more - I haven't had time to play with the Feature: configurable plugins #711 config updates in depth much. - OK, that's cool :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mps
does not break on macOS by default asgpt2
does work, I have tested it will load and complete afast
config, however this test needs to be consistent for all runs so forcingcpu
was a reasonable solution. This test in particular results in the error below when asking for generation on""
onmps
, the error seemed like a clear message to the user since overriding the device can be a straight forward config setting now.
ERROR root:huggingface.py:203 Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device.
- The
setattr()
here could be replaced with a simple dictionary as both are supported by config I just followed more closely whatcli
will generate for as that is the more specific object type.
config_root = {
"generators": {
"huggingface": {
"Pipeline": {
"name": "gpt2",
"hf_args": {
"device": "cpu",
},
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generations on empty strings frequently throw errors. Do we have a test for this? (I could easily verify this myself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't test all generators for empty string behavior by default because they each their own unique special snowflake style (making it a schlep) and often require an API key (making it an expensive pain). I tried to write this one day and discovered it was slightly less feasible than imagined - the new config setup might make it easier now. There are a few tests in place, but it's incomplete.
gpu: 1 | ||
hf_args: | ||
torch_dtype: float16 | ||
device: cuda:1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come gpu: 1
changed to cuda:1
here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are simply example format being documented in the test file. They are not actually consumed
values in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we increase consistency to make the example more straightforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest the goal of the example is to show what you might configure differently depending on context.
The example here would define a model under test that is a huggingface.Pipeline
would used any cuda
devices while a huggingface
generator created for use by the test.Blank
detector would specify to use cuda:1
for it's device, and a huggingface
generator created for use by the test.Blank
buff would use device cuda:0
. This would mean the model under tests is free to use any available device while the detector and buff would use specific different GPU devices from each other.
I think a more concrete
or viable
example might be on the horizo. This level of detail in config is likely rare for users to need and @erickgalinkin may be the only actual consumer for this level for the near term. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels right -- I suspect the overwhelming majority of users will not need to configure things like this, but it's good to have. I'll likely need to refactor parts of AutoDAN
and/or TAP
as a result of this PR being merged and it's plausible that we'll need this level of granularity in configs, particularly for folks using resource-constrained systems.
gpu: 1 | ||
hf_args: | ||
device: cuda:0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and gpu: 1
changed to cuda:0
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpu
was a theoretical value device
aligns with actual huggingface expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. demonstrating variation is good. otoh I think the example could be confusing with gpu: 1
referring to both cuda: 0
and cuda: 1
within sight of each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final result of the edits is removal of all gpu:
entries and only provides huggingface
examples using the device
key.
* raise error when passed negative device integer * rename parameter tracking var * remove unused import * add tests for `_select_hf_device()` Signed-off-by: Jeffrey Martin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thanks!
Fix #672
Fix #589
Provide huggingface
Pipeline
andLLaVA
configuration values ashf_args
that can be passed via theyaml
orjson
generator configuration paths.__init__
where possible_load_client()
GarakException
vsException
mps
supportcuda
vsmps
vscpu
in a common waypipeline
args at all timesdo_sample
whenmodel
is a parameter to theCallable
low_cpu_mem_usage
and allpipeline
arguments forCallables
withoutmodel
Note: Whilehugggingface.Model
not extendsPipeline
due to how the model is loaded the onlyhf_arg
currently passed ishf_args["device"]
more work is needed to support moreTesting of each class type is needed and should likely be baselined code prior to #711 as few recommended possible testing configurations used on various hardware:
Example configuration yaml:
When no
device
is specified by configurationgarak
will now selectcuda
/mps
/cpu
depending on environment availability.