Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 #32287

seokhyunan · 2024-07-29T08:36:17Z

System Info

Hardware used: NVIDIA A6000 48G, A100 80G
Base models used: Mistral-7B-v0.3, Llama-3.0/1-8B
accelerate: 0.32.0
deepspeed: 0.14.4

Who can help?

@ArthurZucker @LysandreJik

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

resize_token_embeddings still does not work in version 4.43.3. Although PR #32214 resolved the issue, it seems that the actual patch does not include this PR, despite being mentioned in the latest patch notes. I confirmed that the test scripts below still do not work in 4.43.3 but do work in the main branch that includes the PR.

Relevant PR	Issue Resolved	Mentioned in Patch Notes	Actually Included in Patch
#32192	✘	✔️ (4.43.2)	✔️ (4.43.2)
#32214	✔️	✔️ (4.43.3)	✘ (4.43.3; comparison link (vs 4.43.2))

If I resize the token embedding to be greater than or equal to the original vocab size, vocab_size is set to zero. Otherwise, another error occurs: RuntimeError: start (0) + length (525336576) exceeds dimension size (524943360).

test.sh:

CUDA_VISIBLE_DEVICES=0 accelerate launch \
    --mixed_precision bf16 \
    --num_machines 1 \
    --num_processes 1 \
    --use_deepspeed \
    --deepspeed_config_file test_ds_config.conf \
    test.py

test.py:

from transformers import AutoModelForCausalLM
from accelerate import Accelerator

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

print(f"Model Config 1: {model.config}")
model.resize_token_embeddings(model.vocab_size + 100, pad_to_multiple_of=8)
print(f"Model Config 2: {model.config}")

test_ds_config.conf:

{
    "bf16": {
        "enabled": "auto"
    },
    "zero_optimization": {
        "stage": 3,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 1e5,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

output:

Model Config 1: LlamaConfig {
  "_name_or_path": "meta-llama/Meta-Llama-3.1-8B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

Model Config 2: LlamaConfig {
  "_name_or_path": "meta-llama/Meta-Llama-3.1-8B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 0
}

Expected behavior

correctly update vocab_size.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-07-30T08:44:55Z

cc @ArthurZucker, I do confirm that previous PR is included in both v4.43.2 and v4.43.3, could you see what's acting up here?

seokhyunan · 2024-07-30T09:14:40Z

@LysandreJik @ArthurZucker The issue is resolved by PR #32214, but this PR was not included in the patch (4.43.3). Please note that the patch notes mention this PR, but the actual patch does not include it. Could you please do some additional verification?

Relevant PRs: #32192, #32214 (continuation of #32192)

PR	Issue Resolved	Mentioned in Patch Notes	Actually Included in Patch
#32192	✘	✔️ (4.43.2)	✔️ (4.43.2)
#32214	✔️	✔️ (4.43.3)	✘ (4.43.3; comparison link (vs 4.43.2))

seokhyunan · 2024-07-31T13:04:05Z

May I ask for further follow-up on this issue? Any additional validation or assistance would be greatly appreciated.

cc @ArthurZucker @amyeroberts

ArthurZucker · 2024-07-31T13:13:04Z

Hey! Yes, I either forgot it, or there was an issue that was introduced on main of transformers but was not in the releases

ArthurZucker · 2024-07-31T13:14:57Z

There will be a release tomorrow!

ArthurZucker · 2024-07-31T13:20:15Z

Unless you needs this urgently today!

seokhyunan · 2024-07-31T13:31:08Z

@ArthurZucker I think the sooner, the better! The issue has been active for two weeks in the official releases, despite the patch notes indicating it is resolved. I also prefer the official release due to reproducibility concerns.

seokhyunan · 2024-08-05T06:22:33Z

I just wanted to send a friendly reminder! If @ArthurZucker doesn't have the bandwidth to handle this issue, any additional assistance from other maintainers would be greatly appreciated.

cc @LysandreJik @amyeroberts

ArthurZucker · 2024-08-05T06:43:42Z

Sorry sir! It was my bad I should have done a patch instantly, was supposed to do a release on friday, but pushing it back to today / Wednesdays. So I'll patch in a bit!

ArthurZucker · 2024-08-05T11:04:37Z

https://github.com/huggingface/transformers/commits/v4.43.4/ sorry for the delay

seokhyunan · 2024-08-05T11:34:34Z

Thank you so much for your consistent follow-up and for patching the issue, @ArthurZucker! I confirmed that the test script works as expected in the newer version 😄

ArthurZucker · 2024-08-05T12:22:12Z

Thanks for being thorough on your side! 🤗

RupasaiR · 2024-10-09T08:35:49Z

Still getting this issue in the version: 4.45.2, any update on this?

ArthurZucker · 2024-10-10T07:29:33Z

Can you open a new issue with a new script, the one shared by @seokhyunan does not produce the error. You might have a setting?

seokhyunan added the bug label Jul 29, 2024

seokhyunan changed the title ~~Resize embeds (with DeepSpeed) is still not fixed in version 4.43.3~~ Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 Jul 29, 2024

seokhyunan closed this as completed Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 #32287

Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 #32287

seokhyunan commented Jul 29, 2024 •

edited

Loading

LysandreJik commented Jul 30, 2024

seokhyunan commented Jul 30, 2024 •

edited

Loading

seokhyunan commented Jul 31, 2024

ArthurZucker commented Jul 31, 2024

ArthurZucker commented Jul 31, 2024

ArthurZucker commented Jul 31, 2024

seokhyunan commented Jul 31, 2024 •

edited

Loading

seokhyunan commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

seokhyunan commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

RupasaiR commented Oct 9, 2024 •

edited

Loading

ArthurZucker commented Oct 10, 2024 •

edited

Loading

Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 #32287

Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 #32287

Comments

seokhyunan commented Jul 29, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Jul 30, 2024

seokhyunan commented Jul 30, 2024 • edited Loading

seokhyunan commented Jul 31, 2024

ArthurZucker commented Jul 31, 2024

ArthurZucker commented Jul 31, 2024

ArthurZucker commented Jul 31, 2024

seokhyunan commented Jul 31, 2024 • edited Loading

seokhyunan commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

seokhyunan commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

RupasaiR commented Oct 9, 2024 • edited Loading

ArthurZucker commented Oct 10, 2024 • edited Loading

seokhyunan commented Jul 29, 2024 •

edited

Loading

seokhyunan commented Jul 30, 2024 •

edited

Loading

seokhyunan commented Jul 31, 2024 •

edited

Loading

RupasaiR commented Oct 9, 2024 •

edited

Loading

ArthurZucker commented Oct 10, 2024 •

edited

Loading