[Bug]: Training embeddings stop after 1 epoch #10778

RONNYKHALIL · 2023-05-28T10:13:45Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What happened?

Training embeddings suddenly stopped working after the 1st epoch.

After some digging, I narrowed it down to the "image_embeddings", and perhaps a font issue. I was able to solve the issue (train embeds through completion) by unchecking "Save images with embedding in PNG chunks"

Steps to reproduce the problem

Train embeddings as usual, but they stop after the first epoch (error logs below)

What should have happened?

Training should've completed as per normal

Commit where the problem happens

20ae71f

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Linux, Other/Cloud

What device are you running WebUI on?

AMD GPUs (RX 6000 above)

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

No

List of extensions

Auto-Photoshop-StableDiffusion-Plugin
SadTalker
adetailer
clip-interrogator-ext deforum-for-automatic1111-webui
gif2gif openOutpaint-webUl-extension sd-dynamic-prompts
sd-webui-controlnet
sd-webui-infinite-image-browsing sd-webui-text2video
sd_civitai_extension sd_web_ui_preset_utils stable-diffusion-webui-state
ultimate-upscale-for-automatic1111
unprompted
LDSR
Lora
ScuNET
SwinIR
prompt-bracket-checker

Console logs

Traceback (most recent call last):
  File "/notebooks/sd/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 612, in train_embedding
    captioned_image = caption_image_overlay(image, title, footer_left, footer_mid, footer_right)
  File "/notebooks/sd/stable-diffusion-webui/modules/textual_inversion/image_embedding.py", line 150, in caption_image_overlay
    font = ImageFont.truetype(textfont, fontsize)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 976, in truetype
    return freetype(font)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 973, in freetype
    return FreeTypeFont(font, size, index, encoding, layout_engine)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 253, in __init__
    load_from_bytes(font)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 233, in load_from_bytes
    self.font_bytes = f.read()
AttributeError: 'FreeTypeFont' object has no attribute 'read'

Additional information

No response

The text was updated successfully, but these errors were encountered:

akx · 2023-05-28T11:33:32Z

What Python version are you running on ?

Python 3.10.x

and yet your traceback quite plainly says /usr/local/lib/python3.9/...

akx · 2023-05-28T11:35:37Z

Anyway, yeah, I found the bug, it's related to my changes in df7070e. Will fix.

…1#10778

onewolf24 · 2023-05-28T22:05:54Z

how will we know when it is fixed ?

Juldnarr · 2023-05-29T14:59:41Z

I am also still unable to train embeddings.

dwoodev · 2023-05-29T18:47:36Z

how will we know when it is fixed ?

When the dev marks it closed and pushes it into release. This ticket should automatically change to closed, so you can check here.

Mark caption_image_overlay's textfont as deprecated; fix #10778

onewolf24 · 2023-06-01T11:09:56Z

now, I am getting an loss nan error. I have tried using --no-half-vae --disable-nan-check and unchecking "Save images with embedding in PNG chunks"

Juldnarr · 2023-06-02T01:57:25Z

I'm still not able to train embeddings.

dzoberg · 2023-06-02T03:03:54Z

Getting the same error as above in the latest 1.3.1 release.

Preparing dataset...
100%|███████████████████████████████████████████| 80/80 [00:01<00:00, 50.14it/s]
100%|███████████████████████████████████████████| 20/20 [00:01<00:00, 10.22it/s]
Traceback (most recent call last):███████████▊  | 19/20 [00:01<00:00, 11.85it/s]
  File "/mnt/data/auto1111/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 612, in train_embedding
    captioned_image = caption_image_overlay(image, title, footer_left, footer_mid, footer_right)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/modules/textual_inversion/image_embedding.py", line 150, in caption_image_overlay
    font = ImageFont.truetype(textfont, fontsize)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 996, in truetype
    return freetype(font)
           ^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 993, in freetype
    return FreeTypeFont(font, size, index, encoding, layout_engine)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 252, in __init__
    load_from_bytes(font)
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 232, in load_from_bytes
    self.font_bytes = f.read()
                      ^^^^^^
AttributeError: 'FreeTypeFont' object has no attribute 'read'

Applying optimization: xformers... done.

akx · 2023-06-02T11:20:46Z

For some reason #10780 didn't end up in 1.3.1.

slopcop · 2023-06-02T22:25:03Z

Any dirty fixes for this? I've found that you can bypass it by setting it to do the sample images very rarely. For me, the sample image is what's crashing it. ¯_(ツ)_/¯ Nvidia 3090 / Ryzen 5950 / barebones extensions

GuillaumeFX · 2023-06-04T18:38:46Z

I have the same problem. Any fixes ?

akx · 2023-06-04T18:55:29Z

Will be fixed in the upcoming version (that's currently in the release candidate branch).

ibcallens · 2023-06-06T16:32:09Z

I upgraded to 1.3.2 - solved

* repair file paste for Firefox from AUTOMATIC1111#10615 remove animation when pasting files into prompt rework two dragdrop js files into one * Upgrade Gradio, remove docs URL hack * fix error in dragdrop logic * Add custom karras scheduler * remove debug print * `modules/api/api.py`: disable `timeout_keep_alive` * Add dropdown for scheduler type * Change karras to kdiffusion * Replace karras by k_diffusion, fix gen info * only add metadata when k_sched is actually been used * remove not related code * Avoid loop import * Minor naming fixes * Add error information for recursion error * use sigma_max/min in model if sigma_max/min is 0 * Revert AUTOMATIC1111#10586 * Fix for AUTOMATIC1111#10643 (pixel noise in webui inpainting canvas breaking inpainting, so that it behaves like plain img2img) * Better hint for user Co-authored-by: catboxanon <[email protected]> * Add hint for custom k_diffusion scheduler * Use settings instead of main interface * Use better way to impl * Fix xyz * Subject:. Improvements to handle VAE filenames in generated image filenames Body:. 1) Added new line 24 to import sd_vae module. 2) Added new method get_vae_filename at lines 340-349 to obtain the VAE filename to be used for image generation and further process it to extract only the filename by splitting it with a dot symbol. 3) Added a new lambda function 'vae_filename' at line 373 to handle VAE filenames. Reason:. A function was needed to get the VAE filename and handle it in the program. Test:. We tested whether we could use this new functionality to get the expected file names. The correct behaviour was confirmed for the following commonly distributed VAE files. vae-ft-mse-840000-ema-pruned.safetensors -> vae-ft-mse-840000-ema-pruned anything-v4.0.vae.pt -> anything-v4.0 ruff response:. There were no problems with the code I added. There was a minor configuration error in a line I did not modify, but I did not modify it as it was not relevant to this modification. Logged. images.py:426:56: F841 [*] Local variable `_` is assigned to but never used images.py:432:43: F841 [*] Local variable `_` is assigned to but never used Impact:. This change makes it easier to retrieve the VAE filename used for image generation and use it in the programme. * Use type to determine if it is enable * fix bad styling for thumbs view in extra networks AUTOMATIC1111#10639 * possible fix for empty list of optimizations AUTOMATIC1111#10605 * Fix ruff error * Use automatic instead of None/default * improvements See: AUTOMATIC1111#10649 (comment) * use Schedule instead of Sched * Changed 'images.zip' to generation by pattern * Optimize tooltip checks * Instead of traversing tens of thousands of text nodes, only look at elements and their children * Debounce the checks to happen only every one second * Restore support for dropdown tooltips * Add support for tooltips on dropdown options * Cleaner image metadata read * Just use console.error, it's in all browsers * Merge executeCallbacks and runCallback, simplify + optimize * Document on* handlers (for extension authors' sake) * Add onAfterUiUpdate callback * Use onAfterUiUpdate where possible * Remove try/except in img metadata read * Small fixes to prepare_tcmalloc for Debian/Ubuntu compatibility - /usr/sbin (where ldconfig is usually located) is not typically on users' PATHs by default, so we set that variable before trying to run ldconfig. - The libtcmalloc library is called libtcmalloc_minimal on Debian/Ubuntu systems. We now check whether libtcmalloc_minimal exists when running prepare_tcmalloc. * change to AMD only if NVIDIA is not presented * Update webui.sh * Remove exit() from select_checkpoint() Raising a FileNotFoundError instead. * Show full traceback in get_sd_model() to reveal if an error is caused by an extension * custom unet support * fix serving images that have already been saved without temp files function that broke after updating gradio * updates for the noise schedule settings * Ability to zoom and move the canvas * Formatted Prettier added fullscreen mode canvas expansion function * Improve reset zoom when toggle tabs * add quoting for infotext values that have a colon in them * Mark caption_image_overlay's textfont as deprecated; fix AUTOMATIC1111#10778 * Sort requirements files * Upgrade xformers * Synchronize requirements/requirements_versions * Remove deps not listed in _versions from requirements * Omit versions when they don't match _versions * fix "hires. fix" prompt/neg sharing same labels as txt2img_prompt/negative_prompt * typo vidocard -> videocard * Corrected the code according to Code style * changed the document to gradioApp() * Round down scale destination dimensions to nearest multiple of 8 * Refactor EmbeddingDatabase.register_embedding() to allow unregistering * fix xyz clip * Upgrade transformers Refs AUTOMATIC1111#9035 (comment) * fix disable png info * clarify issue template * Only poll gamepads while connected * Update imageviewerGamepad.js * Patch GitPython to not use leaky persistent processes * Add & use modules.errors.print_error where currently printing exception info by hand * Revert "fix xyz clip" This reverts commit edd766e. * fix get_conds_with_caching() * improve filename matching for mask we should not rely that mask filename will be of the same extension as the image filename so better pattern matching is added * add scale_by to batch processing * ruffed * Moved the script to the extension build-in * Added VAE listing to web API. * Fix s_min_uncond default type int * Move gamepaddisconnected listener * Vendor in the single module used from taming_transformers; remove taming_transformers dependency (and fix the two ruff complaints) * a small fix for very wide images, because of the scroll bar was the wrong zoom * Frontend: only look at top-level tabs, not nested tabs Refs adieyal/sd-dynamic-prompts#459 (comment) * Fix typo in `--update-check` help message Change `chck` to `check` * rename print_error to report, use it with together with package name * change UI reorder setting to multiselect * add an option to show selected setting in main txt2img/img2img UI split some code from ui.py into ui_settings.py ui_gradio_edxtensions.py add before_process callback for scripts add ability for alwayson scripts to specify section and let user reorder those sections * fix [Bug]: LoRA don't apply on dropdown list sd_lora AUTOMATIC1111#10880 * Fixed the problem with sticking to the mouse, created a tooltip * use ui_reorder_list rather than ui_reorder for UI reorder option to make the program not break when reverting to old version * fix 10896 pnginfo parameters * remove redundant * assign devices.dtype early because it's needed before the model is loaded * revert default cross attention optimization to Doggettx make --disable-opt-split-attention command line option work again * add hiding and a colspans to startup profile table * add subdir support for images, masks and output; search mask only in subdir * fallback to original file retrieving; skip img if mask not found usage of `shared.walk_files` breaks controlnet extension images are processed in different order which leads to unmatched img file used for img2img and img file used for controlnet (if no folder is specified for control net or the same as img2img input dir used for it) * revert the erroneous change for model setting added in df02498 * Added the ability to configure hotkeys via webui Now you can configure the hotkeys directly through the settings JS and Python scripts are tested and code style compliant * Added a hotkey repeat check to avoid bugs * Support dynamic sort of extra networks * lint fixes * Cross attention optimization Cross attention optimization cross attention optimization * remove redundant call list_optimizers() * remove redundant * Simplify a bunch of `len(x) > 0`/`len(x) == 0` style expressions * fallback version info form CHANGELOG.md * Made tooltip optional. You can disable it in the settings. Enabled by default * Added support for workarounds on external GPU. lspci detects VGA for main/integrated videocards and Display for external videocards. This commit should apply workarounds on computers with more than one GPU. Useful for most laptops using weak iGPU and good dGPU. Signed-off-by: Pablo Cholaky <[email protected]> * Apply suggestions from code review Co-authored-by: Aarni Koskela <[email protected]> * Added the ability to swap the zoom hotkeys and resize the brush * small ui fix In the error the user will see R instead of KeyR * Update modules/launch_utils.py Co-authored-by: Aarni Koskela <[email protected]> * fallback version info form CHANGELOG.md * a yet another method to restart webui * Added sysinfo tab to settings * lint * Round upscaled dimensions only when not divisible by 8 * Use a more concise calculation for dest dims * Fix missing ext_filter kwarg * Made the applyZoomAndPan function global for other extensions * torch.cuda.is_available() check for SdOptimizationXformers * fix conds caching with extra network * simplify self.extra_network_data * remove redone compare * Fixed the redmask bug * Made a function applyZoomAndPan isolated each instance Isolated each instance of applyZoomAndPan, now if you add another element to the page, they will work correctly * Fixed visual bugs * Correct definition zoom level I changed the regular expression and now I always have to select scale from style.transfo * Update ui_tempdir.py Make override function have the same input parameters with original function * infer styles from prompts, and an option to control the behavior * add whitelist for environment in the report add extra link to view the report instead of downloading it * fix the broken line for AUTOMATIC1111#10990 * fix for conds of second hires fox pass being calculated using first pass's networks, and add an option to revert to old behavior * prevent calculating cons for second pass of hires fix when they are the same as for the first pass * Add endpoint to get latent_upscale_modes for hires fix * Zoom and Pan: move helpers into its namespace to avoid littering global scope * Zoom and Pan: use elementIDs from closure scope * Zoom and Pan: simplify getElements (it's not actually async) * Zoom and Pan: use for instead of forEach * Zoom and Pan: simplify waitForOpts * revert the message to how it was * rework-disable-autolaunch * Restart: only do restart if running via the wrapper script * restore old disable --autolaunch * SD_WEBUI_RESTARTING * print error and continue print error and continue * Forcing Torch Version to 1.13.1 for Navi and Renoir GPUs * Fix error in webui.sh * Force python1 for Navi1 only, use python_cmd for python * Check python version for Navi 1 only * Write "RX 5000 Series" instead of "Navi" in err * link footer API to Wiki when API is not active * Skip force pyton and pytorch ver if TORCH_COMMAND already set * Fix upcast attention dtype error. Without this fix, enabling the "Upcast cross attention layer to float32" option while also using `--opt-sdp-attention` breaks generation with an error: ``` File "/ext3/automatic1111/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 612, in sdp_attnblock_forward out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False) RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead. ``` The fix is to make sure to upcast the value tensor too. * persistent conds cache Update shared.py * Generate Forever during generation Generate Forever during generation * Split mask blur into X and Y components Prequisite to fixing Outpainting MK2 mask blur bug. * Split Outpainting MK2 mask blur into X and Y components Fixes unexpected noise in non-outpainted borders when using MK2 script. * Don't die when a LoRA is a broken symlink Fixes AUTOMATIC1111#11098 * linter * add changelog for 1.4.0 * fixed typos * Improved error output, improved settings menu * remove console.log * Use os.makedirs(..., exist_ok=True) * Reworked the disabling of functions, refactored part of the code * Formatting code with Prettier * Fix Typo of hints.js * Strip whitespaces from URL and dirname prior to extension installation This avoid some cryptic errors brought by accidental spaces around urls * add missing infotext entry for the pad cond/uncond option --------- Signed-off-by: Pablo Cholaky <[email protected]> Co-authored-by: AUTOMATIC1111 <[email protected]> Co-authored-by: Aarni Koskela <[email protected]> Co-authored-by: Kohaku-Blueleaf <[email protected]> Co-authored-by: Monty Anderson <[email protected]> Co-authored-by: catboxanon <[email protected]> Co-authored-by: ArthurHeitmann <[email protected]> Co-authored-by: fumitaka.yano <[email protected]> Co-authored-by: strelokhalfer <[email protected]> Co-authored-by: kernelmethod <[email protected]> Co-authored-by: Roman Beltiukov <[email protected]> Co-authored-by: linkoid <[email protected]> Co-authored-by: Danil Boldyrev <[email protected]> Co-authored-by: Sakura-Luna <[email protected]> Co-authored-by: nyqui <[email protected]> Co-authored-by: yoinked <[email protected]> Co-authored-by: ramyma <[email protected]> Co-authored-by: klimaleksus <[email protected]> Co-authored-by: w-e-w <[email protected]> Co-authored-by: missionfloyd <[email protected]> Co-authored-by: Artem Kotov <[email protected]> Co-authored-by: James <[email protected]> Co-authored-by: David Chuang <[email protected]> Co-authored-by: Will Frey <[email protected]> Co-authored-by: Pablo Cholaky <[email protected]> Co-authored-by: Chanchana Sornsoontorn <[email protected]> Co-authored-by: Vivek K. Vasishtha <[email protected]> Co-authored-by: Vesnica <[email protected]> Co-authored-by: DGdev91 <[email protected]> Co-authored-by: Alexander Ljungberg <[email protected]> Co-authored-by: Splendide Imaginarius <119545140+Splendide-Imaginarius@users.noreply.github.com> Co-authored-by: arch-fan <[email protected]> Co-authored-by: zhtttylz <[email protected]> Co-authored-by: Jabasukuriputo Wang <[email protected]>

RONNYKHALIL added the bug-report Report of a bug, yet to be confirmed label May 28, 2023

akx added a commit to akx/sd-webui that referenced this issue May 28, 2023

Mark caption_image_overlay's textfont as deprecated; fix AUTOMATIC111…

1013758

…1#10778

akx mentioned this issue May 28, 2023

Mark caption_image_overlay's textfont as deprecated; fix #10778 #10780

Merged

4 tasks

akx added the bug Report of a confirmed bug label May 28, 2023

AUTOMATIC1111 added a commit that referenced this issue May 31, 2023

Merge pull request #10780 from akx/image-emb-fonts

d67ef01

Mark caption_image_overlay's textfont as deprecated; fix #10778

AUTOMATIC1111 closed this as completed in 6f754ab Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Training embeddings stop after 1 epoch #10778

[Bug]: Training embeddings stop after 1 epoch #10778

RONNYKHALIL commented May 28, 2023

akx commented May 28, 2023 •

edited

Loading

akx commented May 28, 2023

onewolf24 commented May 28, 2023

Juldnarr commented May 29, 2023

dwoodev commented May 29, 2023

onewolf24 commented Jun 1, 2023

Juldnarr commented Jun 2, 2023

dzoberg commented Jun 2, 2023 •

edited

Loading

akx commented Jun 2, 2023

slopcop commented Jun 2, 2023

GuillaumeFX commented Jun 4, 2023

akx commented Jun 4, 2023

ibcallens commented Jun 6, 2023

[Bug]: Training embeddings stop after 1 epoch #10778

[Bug]: Training embeddings stop after 1 epoch #10778

Comments

RONNYKHALIL commented May 28, 2023

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What Python version are you running on ?

What platforms do you use to access the UI ?

What device are you running WebUI on?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

akx commented May 28, 2023 • edited Loading

akx commented May 28, 2023

onewolf24 commented May 28, 2023

Juldnarr commented May 29, 2023

dwoodev commented May 29, 2023

onewolf24 commented Jun 1, 2023

Juldnarr commented Jun 2, 2023

dzoberg commented Jun 2, 2023 • edited Loading

akx commented Jun 2, 2023

slopcop commented Jun 2, 2023

GuillaumeFX commented Jun 4, 2023

akx commented Jun 4, 2023

ibcallens commented Jun 6, 2023

akx commented May 28, 2023 •

edited

Loading

dzoberg commented Jun 2, 2023 •

edited

Loading