Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 #32287

Closed
2 of 4 tasks
seokhyunan opened this issue Jul 29, 2024 · 14 comments
Closed
2 of 4 tasks

Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 #32287

seokhyunan opened this issue Jul 29, 2024 · 14 comments
Labels

Comments

@seokhyunan
Copy link

seokhyunan commented Jul 29, 2024

System Info

  • Hardware used: NVIDIA A6000 48G, A100 80G
  • Base models used: Mistral-7B-v0.3, Llama-3.0/1-8B
  • accelerate: 0.32.0
  • deepspeed: 0.14.4

Who can help?

@ArthurZucker @LysandreJik

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

resize_token_embeddings still does not work in version 4.43.3. Although PR #32214 resolved the issue, it seems that the actual patch does not include this PR, despite being mentioned in the latest patch notes. I confirmed that the test scripts below still do not work in 4.43.3 but do work in the main branch that includes the PR.

Relevant PR Issue Resolved Mentioned in Patch Notes Actually Included in Patch
#32192 ✔️ (4.43.2) ✔️ (4.43.2)
#32214 ✔️ ✔️ (4.43.3) ✘ (4.43.3; comparison link (vs 4.43.2))

If I resize the token embedding to be greater than or equal to the original vocab size, vocab_size is set to zero. Otherwise, another error occurs: RuntimeError: start (0) + length (525336576) exceeds dimension size (524943360).

test.sh:

CUDA_VISIBLE_DEVICES=0 accelerate launch \
    --mixed_precision bf16 \
    --num_machines 1 \
    --num_processes 1 \
    --use_deepspeed \
    --deepspeed_config_file test_ds_config.conf \
    test.py

test.py:

from transformers import AutoModelForCausalLM
from accelerate import Accelerator

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

print(f"Model Config 1: {model.config}")
model.resize_token_embeddings(model.vocab_size + 100, pad_to_multiple_of=8)
print(f"Model Config 2: {model.config}")

test_ds_config.conf:

{
    "bf16": {
        "enabled": "auto"
    },
    "zero_optimization": {
        "stage": 3,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 1e5,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

output:

Model Config 1: LlamaConfig {
  "_name_or_path": "meta-llama/Meta-Llama-3.1-8B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

Model Config 2: LlamaConfig {
  "_name_or_path": "meta-llama/Meta-Llama-3.1-8B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 0
}

Expected behavior

correctly update vocab_size.

@seokhyunan seokhyunan added the bug label Jul 29, 2024
@seokhyunan seokhyunan changed the title Resize embeds (with DeepSpeed) is still not fixed in version 4.43.3 Resize embeds (with Deepspeed) is still not fixed in version 4.43.3 Jul 29, 2024
@LysandreJik
Copy link
Member

cc @ArthurZucker, I do confirm that previous PR is included in both v4.43.2 and v4.43.3, could you see what's acting up here?

@seokhyunan
Copy link
Author

seokhyunan commented Jul 30, 2024

@LysandreJik @ArthurZucker The issue is resolved by PR #32214, but this PR was not included in the patch (4.43.3). Please note that the patch notes mention this PR, but the actual patch does not include it. Could you please do some additional verification?

Relevant PRs: #32192, #32214 (continuation of #32192)

PR Issue Resolved Mentioned in Patch Notes Actually Included in Patch
#32192 ✔️ (4.43.2) ✔️ (4.43.2)
#32214 ✔️ ✔️ (4.43.3) ✘ (4.43.3; comparison link (vs 4.43.2))

@seokhyunan
Copy link
Author

May I ask for further follow-up on this issue? Any additional validation or assistance would be greatly appreciated.

cc @ArthurZucker @amyeroberts

@ArthurZucker
Copy link
Collaborator

Hey! Yes, I either forgot it, or there was an issue that was introduced on main of transformers but was not in the releases

@ArthurZucker
Copy link
Collaborator

There will be a release tomorrow!

@ArthurZucker
Copy link
Collaborator

Unless you needs this urgently today!

@seokhyunan
Copy link
Author

seokhyunan commented Jul 31, 2024

@ArthurZucker I think the sooner, the better! The issue has been active for two weeks in the official releases, despite the patch notes indicating it is resolved. I also prefer the official release due to reproducibility concerns.

@seokhyunan
Copy link
Author

I just wanted to send a friendly reminder! If @ArthurZucker doesn't have the bandwidth to handle this issue, any additional assistance from other maintainers would be greatly appreciated.

cc @LysandreJik @amyeroberts

@ArthurZucker
Copy link
Collaborator

Sorry sir! It was my bad I should have done a patch instantly, was supposed to do a release on friday, but pushing it back to today / Wednesdays. So I'll patch in a bit!

@ArthurZucker
Copy link
Collaborator

@seokhyunan
Copy link
Author

Thank you so much for your consistent follow-up and for patching the issue, @ArthurZucker! I confirmed that the test script works as expected in the newer version 😄

@ArthurZucker
Copy link
Collaborator

Thanks for being thorough on your side! 🤗

@RupasaiR
Copy link

RupasaiR commented Oct 9, 2024

Still getting this issue in the version: 4.45.2, any update on this?

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Oct 10, 2024

Can you open a new issue with a new script, the one shared by @seokhyunan does not produce the error. You might have a setting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants