Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 #16

Open
kingpingyue opened this issue Mar 23, 2024 · 7 comments

Comments

@kingpingyue
Copy link

No description provided.

@jiahe7ay
Copy link
Owner

不懂你这个啥意思 能具体点吗

@kingpingyue
Copy link
Author

kingpingyue commented Mar 25, 2024 via email

@jiahe7ay
Copy link
Owner

是效果不好吗

@kingpingyue
Copy link
Author

kingpingyue commented Apr 15, 2024

from glob import glob
import numpy as np
import torch
from datasets import load_dataset
from transformers import DataCollatorForLanguageModeling, TrainingArguments, Trainer, TrainerCallback,
TrainerState, TrainerControl, AutoConfig

from modeling_qwen import QWenLMHeadModel
from tokenization_qwen import QWenTokenizer

max_seq_len = 128

model_path = 'D:\Workspace\models\Qwen-1_8B\qwen_test\qwen20M_v1\checkpoint-4908'
config_path = 'D:\Workspace\models\Qwen-1_8B\qwen_test'
// 初始化模型配置参数
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
// 初始化分词器
tokenizer = QWenTokenizer.from_pretrained(config_path)
tokenizer.pad_token_id = tokenizer.im_end_id
all_file_list = glob(pathname="E:\BaiduNetdiskDownload\gpt2_data\baike2018qa\*.csv")
train_file_list = all_file_list[:6]
test_file_list = all_file_list[:3]

dataset = load_dataset("csv", data_files={'train': train_file_list, 'valid': test_file_list},
cache_dir="cache_data")
def tokenize(element):
outputs = tokenizer(
element["content"]
)
input_batch = []
for input_ids in outputs["input_ids"]:
token_ids = input_ids[:128] + [tokenizer.pad_token_id] * (128 - len(input_ids))
input_batch.append(token_ids)
return {"input_ids": input_batch}

// 对原始数据集进行分词和编码,并生成经过处理的分词后的数据集
tokenized_datasets = dataset.map(
tokenize, batched=True, batch_size=128, remove_columns=dataset["train"].column_names
)

data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
// 根据模型配置参数创建模型
model = QWenLMHeadModel(config)

class MyTrainerCallback(TrainerCallback):
log_cnt = 0

def on_log(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
    '''
    //在打印 n 次日志后清除cuda缓存,适合低显存设备,能防止OOM
    '''
    self.log_cnt += 1
    if self.log_cnt % 2 == 0:
        torch.cuda.empty_cache()

def on_epoch_end(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
    '''
   // 在on_epoch_end时保存一次模型。
   // TrainingArguments的 save_strategy 中 epoch 和 steps 不兼容。要实现每隔 save_steps 步保存一次检查点,考虑到磁//盘空间大小,最多只保存最近3个检查点。
    '''
  //  # 设置should_save=True并返回即可
    control.should_save = True
    return control

my_trainer_callback = MyTrainerCallback()
args = TrainingArguments(
output_dir='qwen20M_v1',
per_device_train_batch_size=1,
per_device_eval_batch_size=4,
gradient_accumulation_steps=5,
num_train_epochs=4,
weight_decay=0.1,
ddp_find_unused_parameters=False,
warmup_steps=0,
learning_rate=1e-6,
evaluation_strategy='steps',
eval_steps=1000,
save_steps=1000,
save_strategy='steps',
save_total_limit=2,
report_to='tensorboard',
optim="adamw_torch",
lr_scheduler_type='cosine',
logging_steps=100,
log_level='info',
logging_first_step=True,
fp16=True,
# use_cpu=True,
# group_by_length=True,
# deepspeed='./ds_config_one_gpu.json',
)
v_num = len(tokenized_datasets["train"])
trainer = Trainer(
model=model,
tokenizer=tokenizer,
args=args,
data_collator=data_collator,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["valid"],
callbacks=[my_trainer_callback],
)

// 计算困惑度Perplexity

trainer.train()
eval_results = trainer.evaluate()
print(f"Perplexity: {np.exp(eval_results['eval_loss']):.2f}")

trainer.save_model(args.output_dir)
大哥 这是我训练代码,有空指导一下

@HelixPark
Copy link

我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,,"  穗 @.***  

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2024年3月25日(星期一) 上午9:23 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16) 不懂你这个啥意思 能具体点吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.>

大佬,我把config.json里的 "hidden_size": 2048,改成512模型是变小了,但是报错了。请教修改了哪里?

报错如下:
Dataset({
features: ['input_ids'],
num_rows: 379743
}) Dataset({
features: ['input_ids'],
num_rows: 8792
})
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Dataset({
features: ['input_ids'],
num_rows: 379743
}) Dataset({
features: ['input_ids'],
num_rows: 8792
})
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Dataset({
features: ['input_ids'],
num_rows: 379743
}) Dataset({
features: ['input_ids'],
num_rows: 8792
})
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Dataset({
features: ['input_ids'],
num_rows: 379743
}) Dataset({
features: ['input_ids'],
num_rows: 8792
})
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
QWen size: 358.1M parameters
QWen size: 358.1M parameters
QWen size: 358.1M parameters
QWen size: 358.1M parameters
QWen size: 358.1M parameters
QWen size: 358.1M parameters
QWen size: 358.1M parameters
QWen size: 358.1M parameters
Detected kernel version 4.19.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Using auto half precision backend
***** Running training *****
Num examples = 379,743
Num Epochs = 1
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 640
Gradient Accumulation steps = 10
Total optimization steps = 593
Number of trainable parameters = 358,072,832
0%| | 0/593 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/llm/tlm/pre_train.py", line 264, in
trainer.train( #'model_save/pre/checkpoint-3400'
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
Traceback (most recent call last):
File "/home/llm/tlm/pre_train.py", line 264, in
trainer.train( #'model_save/pre/checkpoint-3400'
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
Traceback (most recent call last):
File "/home/llm/tlm/pre_train.py", line 264, in
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
trainer.train( #'model_save/pre/checkpoint-3400'
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
outputs = block(transformer_outputs = self.transformer(

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
return self._call_impl(*args, **kwargs)
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
attn_outputs = self.attn(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
outputs = block(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
query, key, value = mixed_x_layer.split(self.split_size, dim=2)
ValueError: too many values to unpack (expected 3)
attn_outputs = self.attn(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
Traceback (most recent call last):
File "/home/llm/tlm/pre_train.py", line 264, in
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
trainer.train( #'model_save/pre/checkpoint-3400'
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
query, key, value = mixed_x_layer.split(self.split_size, dim=2)
ValueError: too many values to unpack (expected 3)
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
outputs = block(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
attn_outputs = self.attn(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
query, key, value = mixed_x_layer.split(self.split_size, dim=2)return model_forward(*args, **kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
ValueError: too many values to unpack (expected 3)
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
outputs = block(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
attn_outputs = self.attn(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
Traceback (most recent call last):
File "/home/llm/tlm/pre_train.py", line 264, in
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
query, key, value = mixed_x_layer.split(self.split_size, dim=2)trainer.train( #'model_save/pre/checkpoint-3400'

Traceback (most recent call last):
ValueError: File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
File "/home/llm/tlm/pre_train.py", line 264, in
too many values to unpack (expected 3)
trainer.train( #'model_save/pre/checkpoint-3400'
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)outputs = model(**inputs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
else self._run_ddp_forward(*inputs, **kwargs)
return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)return forward_call(*args, **kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
outputs = block(
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
outputs = block(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
attn_outputs = self.attn(return forward_call(*args, **kwargs)

File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
attn_outputs = self.attn(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
query, key, value = mixed_x_layer.split(self.split_size, dim=2)
return forward_call(*args, **kwargs)ValueError
: too many values to unpack (expected 3) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward

query, key, value = mixed_x_layer.split(self.split_size, dim=2)

ValueError: too many values to unpack (expected 3)
Traceback (most recent call last):
File "/home/llm/tlm/pre_train.py", line 264, in
trainer.train( #'model_save/pre/checkpoint-3400'
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
Traceback (most recent call last):
File "/home/llm/tlm/pre_train.py", line 264, in
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
outputs = block(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
trainer.train( #'model_save/pre/checkpoint-3400'
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
return inner_training_loop(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
attn_outputs = self.attn(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
tr_loss_step = self.training_step(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
query, key, value = mixed_x_layer.split(self.split_size, dim=2)
ValueError: too many values to unpack (expected 3)
loss = self.compute_loss(model, inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward
transformer_outputs = self.transformer(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward
outputs = block(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward
attn_outputs = self.attn(
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward
query, key, value = mixed_x_layer.split(self.split_size, dim=2)
ValueError: too many values to unpack (expected 3)
0%| | 0/593 [00:00<?, ?it/s]
[2024-05-08 18:01:42,890] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367112 closing signal SIGTERM
[2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367117 closing signal SIGTERM
[2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367119 closing signal SIGTERM

@2279072142
Copy link

我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,,"  穗 @.***  

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2024年3月25日(星期一) 上午9:23 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16) 不懂你这个啥意思 能具体点吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: _@**.**_>

大佬,我把config.json里的 "hidden_size": 2048,改成512模型是变小了,但是报错了。请教修改了哪里?

报错如下: Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters Detected kernel version 4.19.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Using auto half precision backend ***** Running training ***** Num examples = 379,743 Num Epochs = 1 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 640 Gradient Accumulation steps = 10 Total optimization steps = 593 Number of trainable parameters = 358,072,832 0%| | 0/593 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block(transformer_outputs = self.transformer(

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return self._call_impl(*args, **kwargs) return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)return model_forward(*args, **kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call ValueError: too many values to unpack (expected 3) return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)trainer.train( #'model_save/pre/checkpoint-3400'

Traceback (most recent call last): ValueError: File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train File "/home/llm/tlm/pre_train.py", line 264, in too many values to unpack (expected 3) trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs)outputs = model(**inputs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(*inputs, **kwargs) return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs)return forward_call(*args, **kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn(return forward_call(*args, **kwargs)

File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) return forward_call(*args, **kwargs)ValueError : too many values to unpack (expected 3) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward

query, key, value = mixed_x_layer.split(self.split_size, dim=2)

ValueError: too many values to unpack (expected 3) Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(**inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) 0%| | 0/593 [00:00<?, ?it/s] [2024-05-08 18:01:42,890] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367112 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367117 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367119 closing signal SIGTERM

你需要把kv_channels那个变量进行修改,需要满足hidden_size=kv_channels*num_attention_heads

@kingpingyue
Copy link
Author

kingpingyue commented Jul 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants