Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Training embeddings stop after 1 epoch #10778

Closed
1 task done
RONNYKHALIL opened this issue May 28, 2023 · 13 comments
Closed
1 task done

[Bug]: Training embeddings stop after 1 epoch #10778

RONNYKHALIL opened this issue May 28, 2023 · 13 comments
Labels
bug Report of a confirmed bug bug-report Report of a bug, yet to be confirmed

Comments

@RONNYKHALIL
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

Training embeddings suddenly stopped working after the 1st epoch.

After some digging, I narrowed it down to the "image_embeddings", and perhaps a font issue. I was able to solve the issue (train embeds through completion) by unchecking "Save images with embedding in PNG chunks"

Steps to reproduce the problem

Train embeddings as usual, but they stop after the first epoch (error logs below)

What should have happened?

Training should've completed as per normal

Commit where the problem happens

20ae71f

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Linux, Other/Cloud

What device are you running WebUI on?

AMD GPUs (RX 6000 above)

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

No

List of extensions

Auto-Photoshop-StableDiffusion-Plugin
SadTalker
adetailer
clip-interrogator-ext deforum-for-automatic1111-webui
gif2gif openOutpaint-webUl-extension sd-dynamic-prompts
sd-webui-controlnet
sd-webui-infinite-image-browsing sd-webui-text2video
sd_civitai_extension sd_web_ui_preset_utils stable-diffusion-webui-state
ultimate-upscale-for-automatic1111
unprompted
LDSR
Lora
ScuNET
SwinIR
prompt-bracket-checker

Console logs

Traceback (most recent call last):
  File "/notebooks/sd/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 612, in train_embedding
    captioned_image = caption_image_overlay(image, title, footer_left, footer_mid, footer_right)
  File "/notebooks/sd/stable-diffusion-webui/modules/textual_inversion/image_embedding.py", line 150, in caption_image_overlay
    font = ImageFont.truetype(textfont, fontsize)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 976, in truetype
    return freetype(font)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 973, in freetype
    return FreeTypeFont(font, size, index, encoding, layout_engine)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 253, in __init__
    load_from_bytes(font)
  File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFont.py", line 233, in load_from_bytes
    self.font_bytes = f.read()
AttributeError: 'FreeTypeFont' object has no attribute 'read'

Additional information

No response

@RONNYKHALIL RONNYKHALIL added the bug-report Report of a bug, yet to be confirmed label May 28, 2023
@akx
Copy link
Collaborator

akx commented May 28, 2023

What Python version are you running on ?

Python 3.10.x

and yet your traceback quite plainly says /usr/local/lib/python3.9/...

@akx
Copy link
Collaborator

akx commented May 28, 2023

Anyway, yeah, I found the bug, it's related to my changes in df7070e. Will fix.

@akx akx added the bug Report of a confirmed bug label May 28, 2023
@onewolf24
Copy link

how will we know when it is fixed ?

@Juldnarr
Copy link

I am also still unable to train embeddings.

@dwoodev
Copy link

dwoodev commented May 29, 2023

how will we know when it is fixed ?

When the dev marks it closed and pushes it into release. This ticket should automatically change to closed, so you can check here.

AUTOMATIC1111 added a commit that referenced this issue May 31, 2023
Mark caption_image_overlay's textfont as deprecated; fix #10778
@onewolf24
Copy link

now, I am getting an loss nan error. I have tried using --no-half-vae --disable-nan-check and unchecking "Save images with embedding in PNG chunks"

@Juldnarr
Copy link

Juldnarr commented Jun 2, 2023

I'm still not able to train embeddings.

@dzoberg
Copy link

dzoberg commented Jun 2, 2023

Getting the same error as above in the latest 1.3.1 release.

Preparing dataset...
100%|███████████████████████████████████████████| 80/80 [00:01<00:00, 50.14it/s]
100%|███████████████████████████████████████████| 20/20 [00:01<00:00, 10.22it/s]
Traceback (most recent call last):███████████▊  | 19/20 [00:01<00:00, 11.85it/s]
  File "/mnt/data/auto1111/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 612, in train_embedding
    captioned_image = caption_image_overlay(image, title, footer_left, footer_mid, footer_right)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/modules/textual_inversion/image_embedding.py", line 150, in caption_image_overlay
    font = ImageFont.truetype(textfont, fontsize)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 996, in truetype
    return freetype(font)
           ^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 993, in freetype
    return FreeTypeFont(font, size, index, encoding, layout_engine)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 252, in __init__
    load_from_bytes(font)
  File "/mnt/data/auto1111/stable-diffusion-webui/venv/lib/python3.11/site-packages/PIL/ImageFont.py", line 232, in load_from_bytes
    self.font_bytes = f.read()
                      ^^^^^^
AttributeError: 'FreeTypeFont' object has no attribute 'read'

Applying optimization: xformers... done.                              

@akx
Copy link
Collaborator

akx commented Jun 2, 2023

For some reason #10780 didn't end up in 1.3.1.

@slopcop
Copy link

slopcop commented Jun 2, 2023

Any dirty fixes for this? I've found that you can bypass it by setting it to do the sample images very rarely. For me, the sample image is what's crashing it. ¯_(ツ)_/¯ Nvidia 3090 / Ryzen 5950 / barebones extensions

@GuillaumeFX
Copy link

I have the same problem. Any fixes ?

@akx
Copy link
Collaborator

akx commented Jun 4, 2023

Will be fixed in the upcoming version (that's currently in the release candidate branch).

@ibcallens
Copy link

I upgraded to 1.3.2 - solved

theBlaufuss added a commit to theBlaufuss/stable-diffusion-webui that referenced this issue Jun 30, 2023
* repair file paste for Firefox from AUTOMATIC1111#10615
remove animation when pasting files into prompt
rework two dragdrop js files into one

* Upgrade Gradio, remove docs URL hack

* fix error in dragdrop logic

* Add custom karras scheduler

* remove debug print

* `modules/api/api.py`: disable `timeout_keep_alive`

* Add dropdown for scheduler type

* Change karras to kdiffusion

* Replace karras by k_diffusion, fix gen info

* only add metadata when k_sched is actually been used

* remove not related code

* Avoid loop import

* Minor naming fixes

* Add error information for recursion error

* use sigma_max/min in model if sigma_max/min is 0

* Revert AUTOMATIC1111#10586

* Fix for AUTOMATIC1111#10643 (pixel noise in webui inpainting canvas breaking inpainting, so that it behaves like plain img2img)

* Better hint for user

Co-authored-by: catboxanon <[email protected]>

* Add hint for custom k_diffusion scheduler

* Use settings instead of main interface

* Use better way to impl

* Fix xyz

* Subject:.
Improvements to handle VAE filenames in generated image filenames

Body:.
1) Added new line 24 to import sd_vae module.
2) Added new method get_vae_filename at lines 340-349 to obtain the VAE filename to be used for image generation and further process it to extract only the filename by splitting it with a dot symbol.
3) Added a new lambda function 'vae_filename' at line 373 to handle VAE filenames.

Reason:.
A function was needed to get the VAE filename and handle it in the program.

Test:.
We tested whether we could use this new functionality to get the expected file names.
The correct behaviour was confirmed for the following commonly distributed VAE files.
vae-ft-mse-840000-ema-pruned.safetensors -> vae-ft-mse-840000-ema-pruned
anything-v4.0.vae.pt -> anything-v4.0

ruff response:.
There were no problems with the code I added.

There was a minor configuration error in a line I did not modify, but I did not modify it as it was not relevant to this modification.
Logged.
images.py:426:56: F841 [*] Local variable `_` is assigned to but never used
images.py:432:43: F841 [*] Local variable `_` is assigned to but never used

Impact:.
This change makes it easier to retrieve the VAE filename used for image generation and use it in the programme.

* Use type to determine if it is enable

* fix bad styling for thumbs view in extra networks AUTOMATIC1111#10639

* possible fix for empty list of optimizations AUTOMATIC1111#10605

* Fix ruff error

* Use automatic instead of None/default

* improvements

See:
AUTOMATIC1111#10649 (comment)

* use Schedule instead of Sched

* Changed 'images.zip' to generation by pattern

* Optimize tooltip checks

* Instead of traversing tens of thousands of text nodes, only look at elements and their children
* Debounce the checks to happen only every one second

* Restore support for dropdown tooltips

* Add support for tooltips on dropdown options

* Cleaner image metadata read

* Just use console.error, it's in all browsers

* Merge executeCallbacks and runCallback, simplify + optimize

* Document on* handlers (for extension authors' sake)

* Add onAfterUiUpdate callback

* Use onAfterUiUpdate where possible

* Remove try/except in img metadata read

* Small fixes to prepare_tcmalloc for Debian/Ubuntu compatibility

- /usr/sbin (where ldconfig is usually located) is not typically on users' PATHs by default, so we set that variable before trying to run ldconfig.
- The libtcmalloc library is called libtcmalloc_minimal on Debian/Ubuntu systems. We now check whether libtcmalloc_minimal exists when running prepare_tcmalloc.

* change to AMD only if NVIDIA is not presented

* Update webui.sh

* Remove exit() from select_checkpoint()

Raising a FileNotFoundError instead.

* Show full traceback in get_sd_model()

to reveal if an error is caused by an extension

* custom unet support

* fix serving images that have already been saved without temp files function that broke after updating gradio

* updates for the noise schedule settings

* Ability to zoom and move the canvas

* Formatted Prettier added fullscreen mode canvas expansion function

* Improve reset zoom when toggle tabs

* add quoting for infotext values that have a colon in them

* Mark caption_image_overlay's textfont as deprecated; fix AUTOMATIC1111#10778

* Sort requirements files

* Upgrade xformers

* Synchronize requirements/requirements_versions

* Remove deps not listed in _versions from requirements

* Omit versions when they don't match _versions

* fix "hires. fix" prompt/neg sharing same labels as txt2img_prompt/negative_prompt

* typo

vidocard -> videocard

* Corrected the code according to Code style

* changed the document to gradioApp()

* Round down scale destination dimensions to nearest multiple of 8

* Refactor EmbeddingDatabase.register_embedding() to allow unregistering

* fix xyz clip

* Upgrade transformers

Refs AUTOMATIC1111#9035 (comment)

* fix disable png info

* clarify issue template

* Only poll gamepads while connected

* Update imageviewerGamepad.js

* Patch GitPython to not use leaky persistent processes

* Add & use modules.errors.print_error where currently printing exception info by hand

* Revert "fix xyz clip"

This reverts commit edd766e.

* fix get_conds_with_caching()

* improve filename matching for mask

we should not rely that mask filename will be of the same extension
as the image filename so better pattern matching is added

* add scale_by to batch processing

* ruffed

* Moved the script to the extension build-in

* Added VAE listing to web API.

* Fix s_min_uncond default type int

* Move gamepaddisconnected listener

* Vendor in the single module used from taming_transformers; remove taming_transformers dependency

(and fix the two ruff complaints)

* a small fix for very wide images, because of the scroll bar was the wrong zoom

* Frontend: only look at top-level tabs, not nested tabs

Refs adieyal/sd-dynamic-prompts#459 (comment)

* Fix typo in `--update-check` help message

Change `chck` to `check`

* rename print_error to report, use it with together with package name

* change UI reorder setting to multiselect

* add an option to show selected setting in main txt2img/img2img UI
split some code from ui.py into ui_settings.py ui_gradio_edxtensions.py
add before_process callback for scripts
add ability for alwayson scripts to specify section and let user reorder those sections

* fix [Bug]: LoRA don't apply on dropdown list sd_lora AUTOMATIC1111#10880

* Fixed the problem with sticking to the mouse, created a tooltip

* use ui_reorder_list rather than ui_reorder for UI reorder option to make the program not break when reverting to old version

* fix 10896 pnginfo parameters

* remove redundant

* assign devices.dtype early because it's needed before the model is loaded

* revert default cross attention optimization to Doggettx
make --disable-opt-split-attention command line option work again

* add hiding and a colspans to startup profile table

* add subdir support for images, masks and output; search mask only in subdir

* fallback to original file retrieving; skip img if mask not found

usage of `shared.walk_files` breaks controlnet extension
images are processed in different order 
which leads to unmatched img file used for img2img and img file used for controlnet 
(if no folder is specified for control net
or the same as img2img input dir used for it)

* revert the erroneous change for model setting added in df02498

* Added the ability to configure hotkeys via webui

Now you can configure the hotkeys directly through the settings

JS and Python scripts are tested and code style compliant

* Added a hotkey repeat check to avoid bugs

* Support dynamic sort of extra networks

* lint fixes

* Cross attention optimization

Cross attention optimization

cross attention optimization

* remove redundant call list_optimizers()

* remove redundant

* Simplify a bunch of `len(x) > 0`/`len(x) == 0` style expressions

* fallback version info form CHANGELOG.md

* Made tooltip optional.

You can disable it in the settings.
Enabled by default

* Added support for workarounds on external GPU.

lspci detects VGA for main/integrated videocards and Display
for external videocards.

This commit should apply workarounds on computers with more than
one GPU. Useful for most laptops using weak iGPU and good dGPU.

Signed-off-by: Pablo Cholaky <[email protected]>

* Apply suggestions from code review

Co-authored-by: Aarni Koskela <[email protected]>

* Added the ability to swap the zoom hotkeys and resize the brush

* small ui fix

In the error the user will see R instead of KeyR

* Update modules/launch_utils.py

Co-authored-by: Aarni Koskela <[email protected]>

* fallback version info form CHANGELOG.md

* a yet another method to restart webui

* Added sysinfo tab to settings

* lint

* Round upscaled dimensions only when not divisible by 8

* Use a more concise calculation for dest dims

* Fix missing ext_filter kwarg

* Made the applyZoomAndPan function global for other extensions

* torch.cuda.is_available() check for SdOptimizationXformers

* fix conds caching with extra network

* simplify self.extra_network_data

* remove redone compare

* Fixed the redmask bug

* Made a function applyZoomAndPan isolated each instance

Isolated each instance of applyZoomAndPan, now if you add another element to the page, they will work correctly

* Fixed visual bugs

* Correct definition zoom level

I changed the regular expression and now I always have to select scale from style.transfo

* Update ui_tempdir.py

Make override function have the same input parameters with original function

* infer styles from prompts, and an option to control the behavior

* add whitelist for environment in the report
add extra link to view the report instead of downloading it

* fix the broken line for AUTOMATIC1111#10990

* fix for conds of second hires fox pass being calculated using first pass's networks, and add an option to revert to old behavior

* prevent calculating cons for second pass of hires fix when they are the same as for the first pass

* Add endpoint to get latent_upscale_modes for hires fix

* Zoom and Pan: move helpers into its namespace to avoid littering global scope

* Zoom and Pan: use elementIDs from closure scope

* Zoom and Pan: simplify getElements (it's not actually async)

* Zoom and Pan: use for instead of forEach

* Zoom and Pan: simplify waitForOpts

* revert the message to how it was

* rework-disable-autolaunch

* Restart: only do restart if running via the wrapper script

* restore old disable --autolaunch

* SD_WEBUI_RESTARTING

* print error and continue

print error and continue

* Forcing Torch Version to 1.13.1 for Navi and Renoir GPUs

* Fix error in webui.sh

* Force python1 for Navi1 only, use python_cmd for python

* Check python version for Navi 1 only

* Write "RX 5000 Series" instead of "Navi" in err

* link footer API to Wiki when API is not active

* Skip force pyton and pytorch ver if TORCH_COMMAND already set

* Fix upcast attention dtype error.

Without this fix, enabling the "Upcast cross attention layer to float32" option while also using `--opt-sdp-attention` breaks generation with an error:

```
  File "/ext3/automatic1111/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 612, in sdp_attnblock_forward
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
```

The fix is to make sure to upcast the value tensor too.

* persistent conds cache

Update shared.py

* Generate Forever during generation

Generate Forever during generation

* Split mask blur into X and Y components

Prequisite to fixing Outpainting MK2 mask blur bug.

* Split Outpainting MK2 mask blur into X and Y components

Fixes unexpected noise in non-outpainted borders when using MK2 script.

* Don't die when a LoRA is a broken symlink

Fixes AUTOMATIC1111#11098

* linter

* add changelog for 1.4.0

* fixed typos

* Improved error output, improved settings menu

* remove console.log

* Use os.makedirs(..., exist_ok=True)

* Reworked the disabling of functions, refactored part of the code

* Formatting code with Prettier

* Fix Typo of hints.js

* Strip whitespaces from URL and dirname prior to extension installation

This avoid some cryptic errors brought by accidental spaces around urls

* add missing infotext entry for the pad cond/uncond option

---------

Signed-off-by: Pablo Cholaky <[email protected]>
Co-authored-by: AUTOMATIC1111 <[email protected]>
Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Kohaku-Blueleaf <[email protected]>
Co-authored-by: Monty Anderson <[email protected]>
Co-authored-by: catboxanon <[email protected]>
Co-authored-by: ArthurHeitmann <[email protected]>
Co-authored-by: fumitaka.yano <[email protected]>
Co-authored-by: strelokhalfer <[email protected]>
Co-authored-by: kernelmethod <[email protected]>
Co-authored-by: Roman Beltiukov <[email protected]>
Co-authored-by: linkoid <[email protected]>
Co-authored-by: Danil Boldyrev <[email protected]>
Co-authored-by: Sakura-Luna <[email protected]>
Co-authored-by: nyqui <[email protected]>
Co-authored-by: yoinked <[email protected]>
Co-authored-by: ramyma <[email protected]>
Co-authored-by: klimaleksus <[email protected]>
Co-authored-by: w-e-w <[email protected]>
Co-authored-by: missionfloyd <[email protected]>
Co-authored-by: Artem Kotov <[email protected]>
Co-authored-by: James <[email protected]>
Co-authored-by: David Chuang <[email protected]>
Co-authored-by: Will Frey <[email protected]>
Co-authored-by: Pablo Cholaky <[email protected]>
Co-authored-by: Chanchana Sornsoontorn <[email protected]>
Co-authored-by: Vivek K. Vasishtha <[email protected]>
Co-authored-by: Vesnica <[email protected]>
Co-authored-by: DGdev91 <[email protected]>
Co-authored-by: Alexander Ljungberg <[email protected]>
Co-authored-by: Splendide Imaginarius <119545140+Splendide-Imaginarius@users.noreply.github.com>
Co-authored-by: arch-fan <[email protected]>
Co-authored-by: zhtttylz <[email protected]>
Co-authored-by: Jabasukuriputo Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report of a confirmed bug bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests

9 participants