-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 #16
Comments
不懂你这个啥意思 能具体点吗 |
我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,,"
穗
***@***.***
…------------------ 原始邮件 ------------------
发件人: ***@***.***>;
发送时间: 2024年3月25日(星期一) 上午9:23
收件人: "jiahe7ay/MINI_LLM";
抄送: "Author";
主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16)
不懂你这个啥意思 能具体点吗
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
是效果不好吗 |
from glob import glob from modeling_qwen import QWenLMHeadModel max_seq_len = 128 model_path = 'D:\Workspace\models\Qwen-1_8B\qwen_test\qwen20M_v1\checkpoint-4908' dataset = load_dataset("csv", data_files={'train': train_file_list, 'valid': test_file_list}, // 对原始数据集进行分词和编码,并生成经过处理的分词后的数据集 data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False) class MyTrainerCallback(TrainerCallback):
my_trainer_callback = MyTrainerCallback() // 计算困惑度Perplexity trainer.train() trainer.save_model(args.output_dir) |
大佬,我把config.json里的 "hidden_size": 2048,改成512模型是变小了,但是报错了。请教修改了哪里? 报错如下: File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call Traceback (most recent call last): File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
ValueError: too many values to unpack (expected 3) |
你需要把kv_channels那个变量进行修改,需要满足hidden_size=kv_channels*num_attention_heads |
多谢大佬,已经注意到了
穗
***@***.***
…------------------ 原始邮件 ------------------
发件人: ***@***.***>;
发送时间: 2024年7月15日(星期一) 上午8:52
收件人: "jiahe7ay/MINI_LLM";
抄送: "Author";
主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16)
我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,," 穗 @.***
…
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2024年3月25日(星期一) 上午9:23 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16) 不懂你这个啥意思 能具体点吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: _@**.**_>
大佬,我把config.json里的 "hidden_size": 2048,改成512模型是变小了,但是报错了。请教修改了哪里?
报错如下: Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters Detected kernel version 4.19.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Using auto half precision backend ***** Running training ***** Num examples = 379,743 Num Epochs = 1 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 640 Gradient Accumulation steps = 10 Total optimization steps = 593 Number of trainable parameters = 358,072,832 0%| | 0/593 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block(transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return self._call_impl(*args, **kwargs) return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call ValueError: too many values to unpack (expected 3) return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)trainer.train( #'model_save/pre/checkpoint-3400'
Traceback (most recent call last): ValueError: File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train File "/home/llm/tlm/pre_train.py", line 264, in too many values to unpack (expected 3) trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs)outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(*inputs, **kwargs) return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs)return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn(return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) return forward_call(*args, **kwargs)ValueError : too many values to unpack (expected 3) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
query, key, value = mixed_x_layer.split(self.split_size, dim=2)
ValueError: too many values to unpack (expected 3) Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) 0%| | 0/593 [00:00<?, ?it/s] [2024-05-08 18:01:42,890] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367112 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367117 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367119 closing signal SIGTERM
你需要把kv_channels那个变量进行修改,需要满足hidden_size=kv_channels*num_attention_heads
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
No description provided.
The text was updated successfully, but these errors were encountered: