fix t5 and mistral model load from config meta tensor bug #42

daisyden · 2024-01-04T05:34:12Z

Verified the performance and accuracy are the same as load from pretrain.

t5 acc

Task	Version	Metric	Value		Stderr
lambada_openai	0	ppl	4.1306	±	0.1503
		acc	0.7233	±	0.0062

Task	Version	Metric	Value		Stderr
lambada_openai	0	ppl	4.1306	±	0.1503
		acc	0.7233	±	0.0062

mistral acc

Task	Version	Metric	Value		Stderr
lambada_openai	0	ppl	3.1808	±	0.0583
		acc	0.7553	±	0.0060

Task	Version	Metric	Value		Stderr
lambada_openai	0	ppl	3.1808	±	0.0583
		acc	0.7553	±	0.0060

t5 perf on spr of diamond with 2 ranks single node:
Inference latency: 0.733 sec.

mistral:
InfInference latency: 1.416 sec.

delock

LGTM

delock · 2024-01-04T05:57:14Z

@daisyden can you also submit a PR to DeepSpeed upstream?

fix t5 and mistral model load from config meta tensor bug

6b0ff1e

delock approved these changes Jan 4, 2024

View reviewed changes

delock merged commit 94873fe into delock:gma/run-opt-branch Jan 4, 2024

Provide feedback