-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P+: Extended Textual Conditioning in Text-to-Image Generation #327
Conversation
how many images how long was training what gpu did you use |
~150 images, ~60 minutes on 4080 |
Thank you for the great PR! This is very interesting. It will take me a little time to check it out, so I will merge the other PRs and then get to it. |
Did you take a look at https://github.com/cloneofsimo/promptplusplus? This seems to bring a few improvements to the original paper. |
Hi, @jakaline-dev ! I've tested the PR, and the training and the image generation seem to work fine! However, the sample generation during the training raises an error. I think the sample generation may require some modification like Do you have any idea?
|
Looking at the code right now, it seems that the sample generation part is using a different pipeline from gen_img_diffusers.py. |
It's too complicated to fix for the sampling during training stages, so going dirty and disabled sampling for now. If it's not urgent, it can be fixed in the future. |
Meanwhile, here is some code for mixing layers (just as the paper did) |
Thank you for taking a look. I think disabling sampling is ok. And sorry to bother you with the changing the file format. I know I will have to update Diffusers at some point, but I have many other hacks in addition to TI in this repo. Diffusers updates quickly and dirty, and is not backward compatible, so updating it is a bit of a pain... I will review and merge after work! |
I've added some modification after merging. Please let me know if you notice anything. Thank you for this great work! |
@jakaline-dev can prompt+ be used for standard finetuning? |
@kgonia You can finetune from a pretrained p+ embedding, but usually finetuning is done with frozen embeddings. Although it kinda would be possible if you tweak the code for text encoder finetuning (haven't seen anyone doing it with just plain TI) |
Implemented from https://prompt-plus.github.io/
top: TI, bottom: XTI
This method is training TI for each cross attention layer.
Paper says the optimal training parameters for XTI is 500 steps with lr=0.005. They said it converged faster than original TI with 5000 steps, but for me it took the same time.
I hardcoded the saved XTI safetensors file to have 16 keys: ['IN01', 'IN02', 'IN04', 'IN05', 'IN07', 'IN08', 'MID', 'OUT03', 'OUT04', 'OUT05', 'OUT06', 'OUT07', 'OUT08', 'OUT09', 'OUT10', 'OUT11'].
To train, use train_textual_inversion_XTI.py. Training args are same with original TI.
To inference, use '--XTI_embeddings' just as '--textual_inversion_embeddings'.
Comparable to LoRA? We will have to see.