-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Dylora #24
Comments
We are always looking for examples where it doesn't work so that we can improve the algorithm. Is there a particular open source codebase you use? |
i try to use 2 of dylora implements,and they both don't work. https://github.com/kohya-ss/sd-scripts/blob/main/networks/dylora.py https://github.com/KohakuBlueleaf/LyCORIS/blob/main/lycoris/dylora.py |
I have also been using D-Adaptation extensively with the same codebase, using it to make Stable Diffusion LoRAs. I am mostly pleased with its results, but quite frequently find myself having to change the learn rate to something other than 1 or using a rather restrictive As an example, I am using DAdaptAdanIP. My best training run so far was when I used a growth rate of 1.06. This caused DAdaptation to settle on a d*lr close to 0.001 (typical for Adan, and produces results that are of good quality). If I don't set a growth rate, it settles on a d*lr of around 0.0027, which tends to be bad at prior preservation. Another thing I have noticed is that I have to do further manual learning rate adjustments when I adjust network rank/dim, which is something I would normally expect to be somewhat accounted for. Please note that my problem includes the Dreambooth regularization technique described in https://arxiv.org/abs/2208.12242 , and I have also been using the min-SNR-gamma technique described in https://arxiv.org/pdf/2303.09556.pdf . I have had similar results in regards to being unable to rely on D-Adaptation's learning rate estimate without using the min-SNR-gamma technique, as well as when using D-Adaptation Adam. If there is any further information you would like from me, any tests you'd like me to run, or if I seem to be misunderstanding the best practices for this optimizer, please reach out to me! The prospect of learning rate free learning for the problems I am working on is simply too appealing for me to not try my hardest to make it work. |
thank u for sharing these useful experience about LORA training. |
Thanks for the detailed information, I will investigate. The NeurIPS submission deadline is approaching (May 17th) so I probably won't have time to investigate until then. One thing you can try is the Adam version in the v3 pull request I've put up. It's a major change to the method and could help in your setting. It can give LR values about ~1/2 as big on some unstable problems which could be exactly what you need. I ran it through the full suite of experiments in my paper and it works as well or better then the previous Adam version on everything. DAdaptAdan is very experimental and I haven't experimented enough with it yet to trust it. |
Alright, I will patiently await your message. In the meantime, I'll give adptlion and the new adam method a try. |
Thanks for the quick response -- I have just had a chance to compare how the v3 Adam performs compared to the old version. I have run both of them on I will have to perform more tests to see if I can find a good configuration for the new Adam implementation without manual learning rate adjustments and try to test it on other datasets, because it looks promising, and I will be sure to compare with the old Adam implementation to see what has changed. I will hopefully have something by the time you are able to look into this further, and wish you luck with your NeurIPS submission. |
In my opinion,Dreambooth regularization has a significant impact on the final results and loss; judging through image results may not be very accurate. |
Thank you very much for sharing the feedback! We are still working on improving the method and the challenging instances are especially useful to us. One thing that could be helpful is the description of the training setting that you have, including the dataset and the batch size. If you could share a training script that includes these things together with any regularization that you use, that would be amazing. Our new version of Adam has a different estimation of D, which should be somewhat smoother and more stable. However, we are also working on other variants, so having examples where D-adaptation fails is extremely valuable. |
I have done some experiments on the dog dataset from the original Dreambooth paper and have found that it exhibits the problems much more clearly. A known working configuration for AdamW8bit was to generate 200 regularization images, use network dim of 128, run for 800 steps, to use a learning rate of 4e-6 for both the unet and text encoder, and to use a cosine LR scheduler -- this deviates from some examples but is much closer to my normal training conditions. Doubling the learning rate for the unet to 8e-6 also works and seems to produce slightly better results, and is closer to my typical use cases -- training the text encoder at half the rate of the unet produces better results in most cases in my experience and others, and I have also run into a few circumstances where I have gotten better results by controlling learning rate and dims of individual blocks of the network, which D-Adaptation unfortunately doesn't seem to support, at least as it is implemented in the scripts I am using right now. When using DAdaptation with the learning rates set to 1.0 and the arguments D-Adaptation definitely responds the wrong way to changes in dimensions of a LoRA model. Adjusting the network dim from 128 to 64 causes D-Adaptation to double its learning rate estimate in response. Normally, changes in network dim require a proportional change to learning rate, but D-Adaptation responds inversely proportional. Most interestingly, the learning rate estimate is close to reasonable near the maximum possible network dim (768) but it still overtrains quickly. The net effect of decreasing network dim is different depending on the beta2 value -- at the default of 0.999, the model lost all editability and learned no distinction of the trained subject. At beta2 of 0.901, I got much better result at network dim of 64 than at 128. Overall, I was able to achieve results that were better in most respects with D-Adaptation with tuning of some hyperparameters, but the optimizer responds in very counterintuitive ways at some times and it is overall unclear which hyperparameters should be tuned to best take advantage of the adaptive optimizer. I have attached a set of test scripts that I use as well as the dog dataset in this zipfile. Since the regularization images take up quite a bit of space, I have not included them, and you will have to generate them yourself (I have included a script that will generate them for you in the correct location). I have been using this script with bmaltais' fork of kohya_ss on Linux, which has install instructions here: https://github.com/bmaltais/kohya_ss#linux-and-macos You will need the Stable Diffusion v1.5 checkpoint located here: https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors (this is a safetensors version and my script points to the .ckpt version, be sure to correct that) If you unzip this file in the root directory of the kohya_ss repo, you shouldn't have to edit any paths other than the path to the Stable Diffusion base model (which will have to be added to both scripts), and you should be able to simply run ./make_dog_class_imgs.sh once and run ./testlora.sh which is currently configured to use DAdaptAdam with decoupled weight decay. Let me know if there are any issues getting my script to run or if you need any more testing on my end. |
Thanks for the scripts and the detailed feedback! Here are some of my thoughts:
|
We have made some progress on the method and released a new version called Prodigy. I haven't tested it on the dog dataset, so I'm not sure if it'd help. All in all, seems like we are not done yet with finding the right method that works for all applications, but we're still working on that. |
thank you for great work!i will test it when i have free time. |
thank u!it works well! |
Thank you for great works!
I often use dadapataion for model training, but it seems to be ineffective with this algorithm.
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
https://arxiv.org/abs/2210.07558
all dadapataion use default d0 such as 1e-6 and not change.
The text was updated successfully, but these errors were encountered: