Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against unset resolved_archive_file #35628

Merged
merged 4 commits into from
Feb 14, 2025

Conversation

dmlap
Copy link
Contributor

@dmlap dmlap commented Jan 11, 2025

What does this PR do?

resolved_archive_file in _load_pretrained_model() appears to be optional. In my case, I was loading a model from a GGUF file and it was None:

model = AutoModelForCausalLM.from_pretrained(train_config.model_name,
                                             device_map='auto',
                                             gguf_file='llama3.2.gguf',
                                             offload_folder='offload')

In that case, archive_file ends up being None and the check for safe tensors raises an error. The change guards against that case and allows loading to continue.

I thought this change was minor enough that new tests were not warranted. If you feel otherwise, happy to add one if you can point me at the right place to do it.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! This is quite an edge case as we have resolved_archive_file = None when loading with gguf + we have disk in device_map. If you have time, please add a test in the tests/quantization/ggml/test_ggml.py file. cc @Isotr0py for visibility

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too! Just need a test case to cover this edge case in test_ggml.py.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dmlap
Copy link
Contributor Author

dmlap commented Jan 14, 2025

Great! I should be able to add one sometime this week

@dmlap
Copy link
Contributor Author

dmlap commented Jan 22, 2025

Putting together a test case was helpful. It appears this condition is only triggered when a GGUF is loaded that is configured to offload a portion of model.state_dict() to disk. modeling_utils::_load_state_dict_into_meta_model() doesn't move the loaded state into modules mapped to disk. With the guard on archive_file from this patch, execution continues until from_pretrained() gets around to calling dispatch_model(), which promptly blows up trying to offload a meta tensor to disk.

I've worked around this by forcing the entire state_dict to get loaded in when gguf_path is specified – that gets the test working and produces the expected output. I'm not sure if this defeats the point of offloading some of the modules to disk (or if I'm missing something more fundamental).

Thoughts or suggestions? I can update the PR with what I have if seeing the necessary changes is easier.

@SunMarc
Copy link
Member

SunMarc commented Jan 22, 2025

After reflection, I think that we shouldn't allow offload with GGUF. This is because with gguf state_dict, we still have to modify the state dict to be compatible with transformers. So we can't really offload to disk.

@dmlap
Copy link
Contributor Author

dmlap commented Jan 22, 2025

That makes sense. Do you think that should happen in transformers or accelerate? If it’s in the model loading here, I don’t mind taking a crack at it.

@SunMarc
Copy link
Member

SunMarc commented Jan 23, 2025

The check should be in transformers !

@dmlap dmlap force-pushed the no-archive-no-safetensors branch 3 times, most recently from e9326f3 to 3c2066d Compare February 1, 2025 03:51
@dmlap
Copy link
Contributor Author

dmlap commented Feb 1, 2025

@SunMarc I modified the handling of device_map so when ”auto” is specified for a GGUF file, it will attempt to remap disk offload back to the CPU. If disk is explicitly configured, a NotImplementedError will be raised.

There’s a test case for the explicit disk mapping but there didn’t seem to be a non-invasive way of testing the auto remapping, and it didn’t seem worth it to me. Let me know if you disagree or would like any further modifications.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating, just a few nits

Comment on lines 4221 to 4233
if gguf_path:
remapped_devices = set()
for name, device in device_map.items():
if device == "disk":
device_map[name] = "cpu"
remapped_devices.add(name)
if len(remapped_devices) > 0:
logger.warning(
"Accelerate has auto-mapped modules to disk but disk offload is not supported for "
"models loaded from GGUF files. Remapping modules to the cpu: "
", ".join(remapped_devices)
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not remap the device_map but raise an error afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this remapping because I originally hit this issue trying to run a GGUF on my MacBook with device_map=“auto” and got that confusing error with the unspecified archive file. If the disk modules had been mapped to cpu instead, everything should have worked.

Now that I’ve dug in quite a bit more, an explicit device mapping (like in the test case) would get past my issue but it does raise the sophistication requirements for developer-users quite a bit. That is, as a user I would have preferred if transformers would have remapped my model or at least warned me.

All that said, let me know if you still think the remapping should be removed and I’ll pull it.

Copy link
Member

@SunMarc SunMarc Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the disk modules had been mapped to cpu instead, everything should have worked.

With the infer_auto_device_map, we map to disk if we don't have any space left on the other devices. So mapping disk to cpu could cause an issue. I prefer just raising a warning saying that disk offload with gguf is not supported.

Comment on lines 4241 to 4254
raise NotImplementedError(
"One or more modules is configured to be mapped to disk. Disk offload is not supported for models "
"loaded from GGUF files."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do RuntimeError instead of NotImplementedError

When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.
If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.
@dmlap dmlap force-pushed the no-archive-no-safetensors branch from 3c2066d to b7f3496 Compare February 10, 2025 15:40
@dmlap
Copy link
Contributor Author

dmlap commented Feb 10, 2025

@SunMarc ok, I think all the comments are addressed. Let me know if any additional changes are needed

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion and iterating !

@dmlap
Copy link
Contributor Author

dmlap commented Feb 13, 2025

No problem! Thanks for the feedback

@SunMarc SunMarc merged commit b45cf0e into huggingface:main Feb 14, 2025
25 checks passed
@dmlap dmlap deleted the no-archive-no-safetensors branch February 15, 2025 17:59
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Feb 21, 2025
* archive_file may not be specified
When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.

* Remap partial disk offload to cpu for GGUF files
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.

* Don't remap auto device_map and raise RuntimeError
If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.

---------

Co-authored-by: Marc Sun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants