Flex model almost working on 4GB vram #595

themanyone · 2025-02-12T11:57:55Z

themanyone
Feb 12, 2025

Oops, I tried to load it with -m

build/bin/sd -m Flex.1-alpha-Q3_K_S.gguf -p "Draw a picture of an elephant" --type q3_K
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: Quadro M3000M, compute capability 5.2, VMM: yes
[INFO ] stable-diffusion.cpp:197  - loading model from 'Flex.1-alpha-Q3_K_S.gguf'
[INFO ] model.cpp:885  - load Flex.1-alpha-Q3_K_S.gguf using gguf format
[ERROR] stable-diffusion.cpp:240  - get sd version from file failed: 'Flex.1-alpha-Q3_K_S.gguf'
new_sd_ctx_t failed

The model is too big and slow for my setup, but this line almost works.

build/bin/sd --diffusion-model Flex.1-alpha-Q3_K_S.gguf -p "Draw a picture of a cool battle tank firing a round" \
--diffusion-fa --vae-tiling --clip-on-cpu --control-net-cpu --steps 1 --cfg-scale 1.0 --type q2_k

[DEBUG] stable-diffusion.cpp:165  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: Quadro M3000M, compute capability 5.2, VMM: yes
[INFO ] stable-diffusion.cpp:225  - loading diffusion model from 'Flex.1-alpha-Q3_K_S.gguf'
[INFO ] model.cpp:885  - load Flex.1-alpha-Q3_K_S.gguf using gguf format
[DEBUG] model.cpp:902  - init from 'Flex.1-alpha-Q3_K_S.gguf'
[INFO ] stable-diffusion.cpp:244  - Version: Flux 
[INFO ] stable-diffusion.cpp:277  - Weight type:                 q2_K
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q2_K
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q2_K
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             q2_K
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:321  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:324  - CLIP: Using CPU backend
[INFO ] stable-diffusion.cpp:328  - Using flash attention in the diffusion model
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[INFO ] flux.hpp:889  - Flux blocks: 8 double, 38 single
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size =  469.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1107 - t5 params backend buffer size =  18166.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1107 - flux params backend buffer size =  2786.16 MB(VRAM) (516 tensors)
[DEBUG] ggml_extend.hpp:1107 - vae params backend buffer size =  94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:419  - loading weights
[DEBUG] model.cpp:1698 - loading tensors from Flex.1-alpha-Q3_K_S.gguf
  |==================================================| 516/516 - 55.56it/s
[ERROR] model.cpp:1912 - tensor 'first_stage_model.decoder.conv_in.bias' not in model file
[ERROR] model.cpp:1912 - tensor 'first_stage_model.decoder.conv_in.weight' not in model file
[ERROR] model.cpp:1912 - tensor 'first_stage_model.decoder.conv_out.bias' not in model file
[ERROR] model.cpp:1912 - tensor 'first_stage_model.decoder.conv_out.weight' not in model file

FSSRepo · 2025-02-27T03:32:15Z

FSSRepo
Feb 27, 2025

The truth is that I don't quite understand why the T5 model is so heavy????? The truth is that the Flux model has a very inefficient architecture in terms of storage.

1 reply

stduhpf Feb 27, 2025

T5 is an ancient model at this point. It's very bad for it's size compared to the SOTA LLMs. Sadly there haven't been any decent encoder/decoder LLMs released since T5.

Sana and Lumina2 chose to use Gemma2 (decoder only) instead of a "real" encoder like T5, and it seems to work just fine, so hopefully T5 will stop being used for future models.

stduhpf · 2025-02-27T15:03:20Z

stduhpf
Feb 27, 2025

[ERROR] model.cpp:1912 - tensor 'first_stage_model.decoder.conv_in.bias' not in model file

This seems to mean the VAE is missing, which makes sense since you're loading it with --diffusion-model. It's also probably missing the t5 and clip-l text encoders.

You should add these arguments to your command : --vae ..\models\ae.sft --clip_l ..\models\clip_l.safetensors --t5xxl ..\models\t5xxl_fp16.safetensors

Maybe then it could work with Q2 quantization at low resolution. Q4 works with 8GB, so hopefully it fits.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flex model almost working on 4GB vram #595

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Flex model almost working on 4GB vram #595

themanyone Feb 12, 2025

Replies: 2 comments · 1 reply

FSSRepo Feb 27, 2025

stduhpf Feb 27, 2025

stduhpf Feb 27, 2025

themanyone
Feb 12, 2025

Replies: 2 comments 1 reply

FSSRepo
Feb 27, 2025

stduhpf
Feb 27, 2025