训练过程显存大小问题？ #45

ssvicnent · 2024-06-11T01:26:46Z

您好，训练和推理该模型时，默认参数下，显存占用大概是多少呢？

lewandofskee · 2024-10-11T13:08:59Z

I remeber that 24G of GPU memory might not be enough for training, and that 32G of GPUmemory would be fine for training.

Karma1628 · 2024-10-17T05:38:57Z

您好，配置文件里用的12的batch_size用32GB可训的话，我将batch_size调整到4在24GB的4090上还是显存爆炸了，可以问下这是什么原因么

hhhJB · 2024-10-24T12:06:52Z

您好，配置文件里用的12的batch_size用32GB可训的话，我将batch_size调整到4在24GB的4090上还是显存爆炸了，可以问下这是什么原因么

metoo

hhhJB · 2024-10-25T06:58:11Z

您好，配置文件里用的12的batch_size用32GB可训的话，我将batch_size调整到4在24GB的4090上还是显存爆炸了，可以问下这是什么原因么

do you figure it out now ?

Karma1628 · 2024-10-25T13:08:58Z

没有诶，调整到1都会爆炸hha就放弃了

Karma1628 · 2024-10-28T17:13:59Z

@lewandofskee 希望可以回复一下(☆▽☆)

zengyanjia · 2024-11-13T11:41:20Z

@lewandofskee 希望可以回复一下(☆▽☆)

24G的话如果不改变原来的配置那这是不够，不是batch_size的原因，模型参数太大的问题，如果没有更大的显存，继续用24G的话，只有改优化器了，AdamW改成SGD就不会发成显存爆炸的问题了，但这样会很慢

Karma1628 · 2024-11-14T03:50:15Z

@lewandofskee 希望可以回复一下(☆▽☆)

24G的话如果不改变原来的配置那这是不够，不是batch_size的原因，模型参数太大的问题，如果没有更大的显存，继续用24G的话，只有改优化器了，AdamW改成SGD就不会发成显存爆炸的问题了，但这样会很慢

好的谢谢，没有考虑过优化器的问题，多谢~

lewandofskee · 2024-12-06T08:49:15Z

Thanks for @zengyanjia comment. Indeed, the issue is not due to batch size but rather the large number of model parameters.

battle1king · 2024-12-09T10:37:25Z

@lewandofskee 希望可以回复一下(☆▽☆)

24G的话如果不改变原来的配置那这是不够，不是batch_size的原因，模型参数太大的问题，如果没有更大的显存，继续用24G的话，只有改优化器了，AdamW改成SGD就不会发成显存爆炸的问题了，但这样会很慢

想问你们训练起来之前有没有遇到过这个问题，

zhe'ying这应该是一个数据集出错的问题，是我路径指定的有问题嘛还是有什么别的地方需要修改。我的数据应该都按照要求排列好了，

Thanks for @zengyanjia comment. Indeed, the issue is not due to batch size but rather the large number of model parameters.

lewandofskee · 2024-12-09T15:01:14Z

@lewandofskee 希望可以回复一下(☆▽☆)

24G的话如果不改变原来的配置那这是不够，不是batch_size的原因，模型参数太大的问题，如果没有更大的显存，继续用24G的话，只有改优化器了，AdamW改成SGD就不会发成显存爆炸的问题了，但这样会很慢

想问你们训练起来之前有没有遇到过这个问题， zhe'ying这应该是一个数据集出错的问题，是我路径指定的有问题嘛还是有什么别的地方需要修改。我的数据应该都按照要求排列好了，

Thanks for @zengyanjia comment. Indeed, the issue is not due to batch size but rather the large number of model parameters.

From your screenshots so far, it's not a problem with the dataset, it's a problem with Clip.

pluto-8 · 2024-12-18T06:46:55Z

@lewandofskee 希望可以回复一下(☆▽☆)

24G的话如果不改变原来的配置那这是不够，不是batch_size的原因，模型参数太大的问题，如果没有更大的显存，继续用24G的话，只有改优化器了，AdamW改成SGD就不会发成显存爆炸的问题了，但这样会很慢

改为SGD得多久训出来啊，训了快200轮了还很差，跑完了1000轮SGD没跑出来，还不如AdamW跑了5轮的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练过程显存大小问题？ #45

训练过程显存大小问题？ #45

ssvicnent commented Jun 11, 2024

lewandofskee commented Oct 11, 2024

Karma1628 commented Oct 17, 2024

hhhJB commented Oct 24, 2024

hhhJB commented Oct 25, 2024

Karma1628 commented Oct 25, 2024

Karma1628 commented Oct 28, 2024

zengyanjia commented Nov 13, 2024

Karma1628 commented Nov 14, 2024

lewandofskee commented Dec 6, 2024

battle1king commented Dec 9, 2024

lewandofskee commented Dec 9, 2024

pluto-8 commented Dec 18, 2024 •

edited

Loading

训练过程显存大小问题？ #45

训练过程显存大小问题？ #45

Comments

ssvicnent commented Jun 11, 2024

lewandofskee commented Oct 11, 2024

Karma1628 commented Oct 17, 2024

hhhJB commented Oct 24, 2024

hhhJB commented Oct 25, 2024

Karma1628 commented Oct 25, 2024

Karma1628 commented Oct 28, 2024

zengyanjia commented Nov 13, 2024

Karma1628 commented Nov 14, 2024

lewandofskee commented Dec 6, 2024

battle1king commented Dec 9, 2024

lewandofskee commented Dec 9, 2024

pluto-8 commented Dec 18, 2024 • edited Loading

pluto-8 commented Dec 18, 2024 •

edited

Loading