Guard against unset resolved_archive_file #35628

dmlap · 2025-01-11T17:00:19Z

What does this PR do?

resolved_archive_file in _load_pretrained_model() appears to be optional. In my case, I was loading a model from a GGUF file and it was None:

model = AutoModelForCausalLM.from_pretrained(train_config.model_name,
                                             device_map='auto',
                                             gguf_file='llama3.2.gguf',
                                             offload_folder='offload')

In that case, archive_file ends up being None and the check for safe tensors raises an error. The change guards against that case and allows loading to continue.

I thought this change was minor enough that new tests were not warranted. If you feel otherwise, happy to add one if you can point me at the right place to do it.

SunMarc

LGTM ! This is quite an edge case as we have resolved_archive_file = None when loading with gguf + we have disk in device_map. If you have time, please add a test in the tests/quantization/ggml/test_ggml.py file. cc @Isotr0py for visibility

Isotr0py

LGTM too! Just need a test case to cover this edge case in test_ggml.py.

HuggingFaceDocBuilderDev · 2025-01-14T11:08:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dmlap · 2025-01-14T17:24:37Z

Great! I should be able to add one sometime this week

dmlap · 2025-01-22T13:01:07Z

Putting together a test case was helpful. It appears this condition is only triggered when a GGUF is loaded that is configured to offload a portion of model.state_dict() to disk. modeling_utils::_load_state_dict_into_meta_model() doesn't move the loaded state into modules mapped to disk. With the guard on archive_file from this patch, execution continues until from_pretrained() gets around to calling dispatch_model(), which promptly blows up trying to offload a meta tensor to disk.

I've worked around this by forcing the entire state_dict to get loaded in when gguf_path is specified – that gets the test working and produces the expected output. I'm not sure if this defeats the point of offloading some of the modules to disk (or if I'm missing something more fundamental).

Thoughts or suggestions? I can update the PR with what I have if seeing the necessary changes is easier.

SunMarc · 2025-01-22T14:28:01Z

After reflection, I think that we shouldn't allow offload with GGUF. This is because with gguf state_dict, we still have to modify the state dict to be compatible with transformers. So we can't really offload to disk.

dmlap · 2025-01-22T22:00:16Z

That makes sense. Do you think that should happen in transformers or accelerate? If it’s in the model loading here, I don’t mind taking a crack at it.

SunMarc · 2025-01-23T10:11:15Z

The check should be in transformers !

dmlap · 2025-02-01T05:39:30Z

@SunMarc I modified the handling of device_map so when ”auto” is specified for a GGUF file, it will attempt to remap disk offload back to the CPU. If disk is explicitly configured, a NotImplementedError will be raised.

There’s a test case for the explicit disk mapping but there didn’t seem to be a non-invasive way of testing the auto remapping, and it didn’t seem worth it to me. Let me know if you disagree or would like any further modifications.

SunMarc

Thanks for iterating, just a few nits

SunMarc · 2025-02-03T18:02:44Z

src/transformers/modeling_utils.py

+            if gguf_path:
+                remapped_devices = set()
+                for name, device in device_map.items():
+                    if device == "disk":
+                        device_map[name] = "cpu"
+                        remapped_devices.add(name)
+                if len(remapped_devices) > 0:
+                    logger.warning(
+                        "Accelerate has auto-mapped modules to disk but disk offload is not supported for "
+                        "models loaded from GGUF files. Remapping modules to the cpu: "
+                        ", ".join(remapped_devices)
+                    )
+


I prefer not remap the device_map but raise an error afterwards.

I did this remapping because I originally hit this issue trying to run a GGUF on my MacBook with device_map=“auto” and got that confusing error with the unspecified archive file. If the disk modules had been mapped to cpu instead, everything should have worked.

Now that I’ve dug in quite a bit more, an explicit device mapping (like in the test case) would get past my issue but it does raise the sophistication requirements for developer-users quite a bit. That is, as a user I would have preferred if transformers would have remapped my model or at least warned me.

All that said, let me know if you still think the remapping should be removed and I’ll pull it.

If the disk modules had been mapped to cpu instead, everything should have worked.

With the infer_auto_device_map, we map to disk if we don't have any space left on the other devices. So mapping disk to cpu could cause an issue. I prefer just raising a warning saying that disk offload with gguf is not supported.

SunMarc · 2025-02-03T18:04:03Z

src/transformers/modeling_utils.py

+            raise NotImplementedError(
+                "One or more modules is configured to be mapped to disk. Disk offload is not supported for models "
+                "loaded from GGUF files."
+            )


Let's just do RuntimeError instead of NotImplementedError

When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.

GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.

If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.

dmlap · 2025-02-10T15:41:42Z

@SunMarc ok, I think all the comments are addressed. Let me know if any additional changes are needed

SunMarc

Thanks for the discussion and iterating !

dmlap · 2025-02-13T19:08:01Z

No problem! Thanks for the feedback

* archive_file may not be specified When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check. * Remap partial disk offload to cpu for GGUF files GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError. * Don't remap auto device_map and raise RuntimeError If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk. --------- Co-authored-by: Marc Sun <[email protected]>

dmlap requested review from Rocketknight1 and ArthurZucker as code owners January 11, 2025 17:00

SunMarc approved these changes Jan 13, 2025

View reviewed changes

Isotr0py approved these changes Jan 14, 2025

View reviewed changes

dmlap force-pushed the no-archive-no-safetensors branch 3 times, most recently from e9326f3 to 3c2066d Compare February 1, 2025 03:51

SunMarc reviewed Feb 3, 2025

View reviewed changes

dmlap added 3 commits February 10, 2025 10:40

archive_file may not be specified

0be9802

When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.

Remap partial disk offload to cpu for GGUF files

136c2a5

GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.

Don't remap auto device_map and raise RuntimeError

b7f3496

If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.

dmlap force-pushed the no-archive-no-safetensors branch from 3c2066d to b7f3496 Compare February 10, 2025 15:40

Merge branch 'main' into no-archive-no-safetensors

a4c0a60

SunMarc approved these changes Feb 10, 2025

View reviewed changes

SunMarc merged commit b45cf0e into huggingface:main Feb 14, 2025
25 checks passed

dmlap deleted the no-archive-no-safetensors branch February 15, 2025 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard against unset resolved_archive_file #35628

Guard against unset resolved_archive_file #35628

dmlap commented Jan 11, 2025

SunMarc left a comment

Isotr0py left a comment

HuggingFaceDocBuilderDev commented Jan 14, 2025

dmlap commented Jan 14, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 22, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 23, 2025

dmlap commented Feb 1, 2025

SunMarc left a comment

SunMarc Feb 3, 2025

dmlap Feb 5, 2025

SunMarc Feb 6, 2025 •

edited

Loading

SunMarc Feb 3, 2025

dmlap commented Feb 10, 2025

SunMarc left a comment

dmlap commented Feb 13, 2025

Guard against unset resolved_archive_file #35628

Guard against unset resolved_archive_file #35628

Conversation

dmlap commented Jan 11, 2025

What does this PR do?

SunMarc left a comment

Choose a reason for hiding this comment

Isotr0py left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 14, 2025

dmlap commented Jan 14, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 22, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 23, 2025

dmlap commented Feb 1, 2025

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc Feb 3, 2025

Choose a reason for hiding this comment

dmlap Feb 5, 2025

Choose a reason for hiding this comment

SunMarc Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

SunMarc Feb 3, 2025

Choose a reason for hiding this comment

dmlap commented Feb 10, 2025

SunMarc left a comment

Choose a reason for hiding this comment

dmlap commented Feb 13, 2025

SunMarc Feb 6, 2025 •

edited

Loading