Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Offload of GGML-quantized model in torch.inference_mode() cm #7525

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

RyanJDick
Copy link
Collaborator

@RyanJDick RyanJDick commented Jan 7, 2025

Summary

This PR contains a bugfix for an edge case with model unloading (from VRAM to RAM). Thanks to @JPPhoto for finding it.

The bug was triggered under the following conditions:

  • A GGML-quantized model is loaded in VRAM
  • We run a Spandrel image-to-image invocation (which is wrapped in a torch.inference_mode() context manager.
  • The model cache attempts to unload the GGML-quantized model from VRAM to RAM.
  • Doing this inside of the torch.inference_mode() cm results in the following error:
 [2025-01-07 15:48:17,744]::[InvokeAI]::ERROR --> Error while invoking session 98a07259-0c03-4111-a8d8-107041cb86f9, invocation d8daa90b-7e4c-4fc4-807c-50ba9be1a4ed (spandrel_image_to_image): Cannot set version_counter for inference tensor
[2025-01-07 15:48:17,744]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/ryan/src/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
  File "/home/ryan/src/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ryan/src/InvokeAI/invokeai/app/invocations/spandrel_image_to_image.py", line 167, in invoke
    with context.models.load(self.image_to_image_model) as spandrel_model:
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/load_base.py", line 60, in __enter__
    self._cache.lock(self._cache_record, None)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 224, in lock
    self._load_locked_model(cache_entry, working_mem_bytes)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 272, in _load_locked_model
    vram_bytes_freed = self._offload_unlocked_models(model_vram_needed, working_mem_bytes)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 458, in _offload_unlocked_models
    cache_entry_bytes_freed = self._move_model_to_ram(cache_entry, vram_bytes_to_free)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 330, in _move_model_to_ram
    return cache_entry.cached_model.partial_unload_from_vram(
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/cached_model/cached_model_with_partial_load.py", line 182, in partial_unload_from_vram
    cur_state_dict = self._model.state_dict()
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1939, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1936, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1843, in _save_to_state_dict
    destination[prefix + name] = param if keep_vars else param.detach()
RuntimeError: Cannot set version_counter for inference tensor

Explanation

From the torch.inference_mode() docs:

Code run under this mode gets better performance by disabling view tracking and version counter bumps.

Disabling version counter bumps results in the aforementioned error when saving GGMLTensors to a state_dict.

This incompatibility between GGMLTensors and torch.inference_mode() is likely caused by the custom tensor type implementation. There may very well be a way to get these to cooperate, but for now it is much simpler to remove the torch.inference_mode() contexts.

Note that there are several other uses of torch.inference_mode() in the Invoke codebase, but they are all tight wrappers around the inference forward pass and do not contain the model load/unload process.

Related Issues / Discussions

Original discussion: https://discord.com/channels/1020123559063990373/1149506274971631688/1326180753159094303

QA Instructions

Find a sequence of operations that triggers the condition. For me, this was:

  • Reserve VRAM in a separate process so that there was ~12GB left.
  • Fresh start of Invoke
  • Run FLUX inference with a GGML 8K model
  • Run Spandrel upscaling

Tests:

  • Confirmed that I can reproduce the error and that it is no longer hit after the change
  • Confirm that there is no speed regression from switching from torch.inference_mode() to torch.no_grad().
    • Before: 50.354s, After: 51.536s

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

…antized model is offloaded from VRAM inside of a torch.inference_mode() context manager, this will cause the following error: 'RuntimeError: Cannot set version_counter for inference tensor'.
@github-actions github-actions bot added python PRs that change python files invocations PRs that change invocations labels Jan 7, 2025
@RyanJDick RyanJDick merged commit 6b18f27 into main Jan 7, 2025
15 checks passed
@RyanJDick RyanJDick deleted the ryan/fix-gguf-offload-in-inference-mode-bug branch January 7, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invocations PRs that change invocations python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants