I'm now at 41.2 it/s, up from 39.2 #7860
Replies: 11 comments 21 replies
-
Actually closer to 41.4 100%|███████████| 20/20 [00:00<00:00, 41.53it/s] |
Beta Was this translation helpful? Give feedback.
-
Even better, with --opt-channelslast I get 42.1 it/s |
Beta Was this translation helpful? Give feedback.
-
Where did you add that |
Beta Was this translation helpful? Give feedback.
-
Thank you, this increased my speed as well with a 4080 |
Beta Was this translation helpful? Give feedback.
-
@CHollman82 This is the numbers I get with torch.compile() but I have to make fixes to the pytorch code to get it to work. |
Beta Was this translation helpful? Give feedback.
-
How did you get this crazy speed? My 4090 can only get a maximum speed of 23it/s under the full default setting in webui. And I have replaced the latest cudnn file and used the latest xformers. |
Beta Was this translation helpful? Give feedback.
-
24.) Automatic1111 Web UI - PC - Free |
Beta Was this translation helpful? Give feedback.
-
I get 1.51 it/s with my 4070. Ha. Even with your addition to main.py and the command args. |
Beta Was this translation helpful? Give feedback.
-
Cool - looking into that now. Just spent two days (after working fine) fighting with Automatic1111 - cuda memory errors instantly. Torch not compiled with cuda/gpu. Nonetype errors. Mat1 and Mat2 errors. Hahaha! |
Beta Was this translation helpful? Give feedback.
-
Is anyone running SD.NEXT with a 4070 Ti 12GB and cu121? I checked the published benchmarks and couldn't find any that matched my setup. My benchmark below timestamp 2024-01-14 04:33:03.298575 I feel like I should be getting better speeds but can't seem to find the right settings. What I have now at least is stable. Trying some changes leads to crash. |
Beta Was this translation helpful? Give feedback.
-
I only got 12-14s/it Running SD A1111 (v1.10.1) on RTX A6000. Loaded XL checkpoint (realvisxlv40_v20Bakedvae), ControlNet XL ip-(adaptor-plus-sdxl-vit-h) to process 1080 x 1920 images. Using --xformer and --medvram, live preview disabled. Batch sizee=4, Batch count=1 (but I do enabled "Batch Loopback"-since my batch count=1, it shouldn't do anything?) Not using Loopback in Script. GPU temp is 81 Celsius. Thanks! ++++++++Run Report+++++++ To create a public link, set Inpaint batch is enabled. 3 masks found. |
Beta Was this translation helpful? Give feedback.
-
In addition to #6954, which tripled my performance, I just found that adding:
torch.backends.cudnn.benchmark = True
to the main py file in A1111 got me up to 41.2 it/s.
I had seen this option before but hadn't noticed any speedup, but I was less savvy at that time.
A question for our NN experts here: Do 'input sizes' change 'often' in any of the various and many A1111 functional pieces?
If that does occur there can be negative impacts from using this option depending on how often the sizes change.
FYI, Today I helped another windows user go from 10 it/s to about 30 it/s. When I upgraded the cuDNN dll's to v8.7 it only went to 16 it/s. But then I discovered that bat file was setting --precision=full and when that was removed it then went to 30.
Is precision=full needed for anything other than training or to prevent black images when using sd v2?
Beta Was this translation helpful? Give feedback.
All reactions