-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard against unset resolved_archive_file #35628
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! This is quite an edge case as we have resolved_archive_file = None
when loading with gguf + we have disk
in device_map
. If you have time, please add a test in the tests/quantization/ggml/test_ggml.py file. cc @Isotr0py for visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too! Just need a test case to cover this edge case in test_ggml.py
.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Great! I should be able to add one sometime this week |
Putting together a test case was helpful. It appears this condition is only triggered when a GGUF is loaded that is configured to offload a portion of I've worked around this by forcing the entire Thoughts or suggestions? I can update the PR with what I have if seeing the necessary changes is easier. |
After reflection, I think that we shouldn't allow offload with GGUF. This is because with gguf state_dict, we still have to modify the state dict to be compatible with transformers. So we can't really offload to disk. |
That makes sense. Do you think that should happen in transformers or accelerate? If it’s in the model loading here, I don’t mind taking a crack at it. |
The check should be in transformers ! |
e9326f3
to
3c2066d
Compare
@SunMarc I modified the handling of There’s a test case for the explicit disk mapping but there didn’t seem to be a non-invasive way of testing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating, just a few nits
src/transformers/modeling_utils.py
Outdated
if gguf_path: | ||
remapped_devices = set() | ||
for name, device in device_map.items(): | ||
if device == "disk": | ||
device_map[name] = "cpu" | ||
remapped_devices.add(name) | ||
if len(remapped_devices) > 0: | ||
logger.warning( | ||
"Accelerate has auto-mapped modules to disk but disk offload is not supported for " | ||
"models loaded from GGUF files. Remapping modules to the cpu: " | ||
", ".join(remapped_devices) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not remap the device_map but raise an error afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this remapping because I originally hit this issue trying to run a GGUF on my MacBook with device_map=“auto”
and got that confusing error with the unspecified archive file. If the disk modules had been mapped to cpu instead, everything should have worked.
Now that I’ve dug in quite a bit more, an explicit device mapping (like in the test case) would get past my issue but it does raise the sophistication requirements for developer-users quite a bit. That is, as a user I would have preferred if transformers would have remapped my model or at least warned me.
All that said, let me know if you still think the remapping should be removed and I’ll pull it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the disk modules had been mapped to cpu instead, everything should have worked.
With the infer_auto_device_map, we map to disk if we don't have any space left on the other devices. So mapping disk to cpu could cause an issue. I prefer just raising a warning saying that disk offload with gguf is not supported.
src/transformers/modeling_utils.py
Outdated
raise NotImplementedError( | ||
"One or more modules is configured to be mapped to disk. Disk offload is not supported for models " | ||
"loaded from GGUF files." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just do RuntimeError
instead of NotImplementedError
When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.
If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.
3c2066d
to
b7f3496
Compare
@SunMarc ok, I think all the comments are addressed. Let me know if any additional changes are needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussion and iterating !
No problem! Thanks for the feedback |
* archive_file may not be specified When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check. * Remap partial disk offload to cpu for GGUF files GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError. * Don't remap auto device_map and raise RuntimeError If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk. --------- Co-authored-by: Marc Sun <[email protected]>
What does this PR do?
resolved_archive_file
in_load_pretrained_model()
appears to be optional. In my case, I was loading a model from a GGUF file and it wasNone
:In that case,
archive_file
ends up beingNone
and the check for safe tensors raises an error. The change guards against that case and allows loading to continue.I thought this change was minor enough that new tests were not warranted. If you feel otherwise, happy to add one if you can point me at the right place to do it.