Adding support for Context Parallelism using Deepseed's DistributedAt… #1501

bhargaveede · 2024-11-20T05:30:04Z

Adding support for Context Parallelism using Deepseed's DistributedAttention

This PR adds support to enable Context Parallelism for Llama models using deepspeed.
This feature can be enabled by using context_parallel_size flag.

This feature enables us to train/eval longer context lengths by parallelizing inputs across Context Parallel group.
For Attention, It uses Deepspeed's DistributedAttention which gathers the sequences for all heads and distributes the heads across the Context Parallel Group so that Attention per head has entire context and are distributed within the Context Parallel group.
Once the attention is done, Outputs are scattered based on sequence length across the group.

Verified Llama3.1 8B and Llama 3.1 70B finetuning for 32K seq length on 8 ranks using this feature.

Could fit 32K sequence length Llama 3.1-8B LoRa finetuning on 4 ranks with selective recompute instead of gradient_checkpointing.
Could fit 32k sequence length for Llama 3.1-70B LoRa finetuning on 4 ranks and 8 ranks with gradient_checkpointing

Llama 3.1-8B command
HL_DS_DISTRIBUTED_ATTENTION_SEQ_DIM=1 MASTER_ADDR=127.0.0.1 MASTER_PORT=12345 python3 ./optimum-habana-fork/examples/gaudi_spawn.py --world_size 8 --use_deepspeed ./optimum-habana-fork/examples/language-modeling/run_lora_clm.py --dataset_name tatsu-lab/alpaca --bf16 True --output_dir /tmp/lora_out --max_seq_len 32768 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --save_strategy no --learning_rate 0.0004 --warmup_ratio 0.03 --lr_scheduler_type "constant" --logging_steps 1 --dataset_concatenation --do_train --use_habana --throughput_warmup_steps 3 --lora_rank 8 --lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj" --attn_softmax_bf16 True --validation_split_percentage 4 --flash_attention_causal_mask True --evaluation_strategy epoch --pipelining_fwd_bwd --use_lazy_mode --use_flash_attention True --deepspeed ./optimum-habana-fork/examples/language-modeling/llama3_ds_zero1_config.json --num_train_epochs 3 --eval_delay 3 --do_eval --lora_alpha 16 --lora_dropout 0.05 --gradient_accumulation_steps 4 --flash_attention_recompute True --context_parallel_size 4 --model_name_or_path meta-llama/Llama-3.1-8B

Test added to verify the context parallelism:
https://github.com/huggingface/optimum-habana/pull/1501/files#diff-
0741a50beca4b08d354933485499f735f9b5493841e8f3af0e89b16ae1e04af4R978

Note:

DistributedAttention iIndices](https://github.com/huggingface/optimum-habana/pull/1501/files#diff-30aeee6868dd1de34878aca0583f57bb5b0dd9a2a8511a80e9a6b2645f39ce6bR490) are initialized as scatter_idx 1 and gather_idx 2 as for Llama Query States has shape [B,N,S,H] - [Batch Size, Num Heads, Seq Length, Head Dim]
As we want to gather on sequence length [Dim 2 of B,N,S,H] and scatter heads [Dim 1 of B,N,S,H]
Other models who want to Integrate DistributedAttention have to adjust the indices based on the shape.

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-11-20T05:34:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…tention

bhargaveede · 2024-11-21T08:08:55Z

@regisss @libinta
Please review the changes which add support for Context Parallelism.
This is needed for 1.19 release

regisss

This cannot be merged before SynapseAI v1.19 is released right?

optimum/habana/accelerate/accelerator.py

optimum/habana/accelerate/data_loader.py

optimum/habana/accelerate/state.py

optimum/habana/distributed/contextparallel.py

optimum/habana/parallel_state.py

optimum/habana/transformers/models/llama/modeling_llama.py

regisss · 2024-11-25T15:30:47Z

optimum/habana/parallel_state.py

Why not putting this file in optimum/habana.distributed?

We can put it.
Can I do the restructuring later in a separate commit?

optimum/habana/transformers/training_args.py

regisss · 2024-11-25T15:32:56Z

tests/test_examples.py

@@ -234,6 +234,7 @@ def to_test(
            "codellama/CodeLlama-13b-Instruct-hf",
            "MIT/ast-finetuned-speech-commands-v2",
            "meta-llama/LlamaGuard-7b",
+            "huggyllama/llama-7b",


Can we perform the test with a more recent version of Llama? This one is Llama v1.

We can add the test on any version of Llama.
We added v1 as for llama2 and 3 we need hugging face token authorization to access the model. If not, it will fail.
Is there a way to pass the token? (or) what do you suggest?

For instance I use this script for the DeepSpeed CI: https://github.com/huggingface/optimum-habana/blob/main/tests/ci/slow_tests_deepspeed.sh
It takes a token as an argument to log in. But not sure you do it like that.

We added this test to check the working of CP feature.
For llama 2 and 3.1, We can add one more for long sequence length.
Can I add that later along with restructuring?

Yes sure, but I would like to do that before release

@regisss I think we should add token similar to https://github.com/huggingface/optimum-habana/blob/main/tests/ci/slow_tests_deepspeed.sh for https://github.com/huggingface/optimum-habana/blob/main/tests/ci/slow_tests_8x.sh as well

I will update the test to Llama3.1. Can you add token to the slow_tests_8x.sh?
Will add it along with restructuring in separate commit.

If you agree for that. Can you get this merged?

Okay, let's do that 👍

@bhargaveede I just pushed a commit to add HF login to the 8x slow tests: 9f9b41e

yeonsily

Can you please also run all of the llama CI tests to make sure this doesn't affect to the current number?

optimum/habana/accelerate/state.py

github-actions · 2024-12-02T16:03:26Z

The code quality check failed, please run make style.

bhargaveede · 2024-12-03T08:13:20Z

@regisss style check is failing in a different file (not part of this PR).
Can you check?

…lelism

regisss · 2024-12-03T17:11:30Z

@regisss style check is failing in a different file (not part of this PR). Can you check?

Yeah that was happening because a PR was merged yesterday without passing the style check. I rebased your branch so everything should be fine now.

…tention (#1501) Co-authored-by: regisss <[email protected]>

…tention (huggingface#1501) Co-authored-by: regisss <[email protected]>

bhargaveede requested a review from vivekgoe November 20, 2024 05:34

bhargaveede force-pushed the context_parallelism branch from c494ea6 to 3cfc93f Compare November 20, 2024 18:23

bhargaveede added the synapse_1.19_dependency label Nov 21, 2024

bhargaveede marked this pull request as ready for review November 21, 2024 07:56

bhargaveede requested review from mandy-li and libinta as code owners November 21, 2024 07:56

bhargaveede requested a review from a user November 21, 2024 07:56

bhargaveede requested a review from regisss as a code owner November 21, 2024 07:56

Adding support for Context Parallelism using Deepseed's DistributedAt…

30d808e

…tention

bhargaveede force-pushed the context_parallelism branch from 3cfc93f to 30d808e Compare November 21, 2024 08:04

bhargaveede added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Nov 21, 2024

regisss reviewed Nov 25, 2024

View reviewed changes

yeonsily reviewed Nov 25, 2024

View reviewed changes

optimum/habana/accelerate/state.py Show resolved Hide resolved

bhargaveede added 2 commits December 2, 2024 17:31

Addressing review comments

f3e24ce

Merge branch 'main' into context_parallelism

9ee9bba

Merge remote-tracking branch 'optimum-habana/main' into context_paral…

2265455

…lelism

regisss approved these changes Dec 3, 2024

View reviewed changes

regisss merged commit 0fbc457 into huggingface:main Dec 3, 2024
4 checks passed

regisss added a commit that referenced this pull request Dec 3, 2024

Adding support for Context Parallelism using Deepseed's DistributedAt…

7fa9d4e

…tention (#1501) Co-authored-by: regisss <[email protected]>

regisss mentioned this pull request Dec 12, 2024

Limit position embeddings in inference #1598

Merged

3 tasks

yafshar mentioned this pull request Jan 6, 2025

Add checks for parallel_state initialization #1680

Merged

3 tasks

Liangyx2 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jan 20, 2025

Adding support for Context Parallelism using Deepseed's DistributedAt…

068abbe

…tention (huggingface#1501) Co-authored-by: regisss <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for Context Parallelism using Deepseed's DistributedAt… #1501

Adding support for Context Parallelism using Deepseed's DistributedAt… #1501

bhargaveede commented Nov 20, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 20, 2024

bhargaveede commented Nov 21, 2024

regisss left a comment

regisss Nov 25, 2024

bhargaveede Dec 2, 2024

regisss Nov 25, 2024

bhargaveede Nov 29, 2024

regisss Nov 29, 2024

bhargaveede Dec 2, 2024

regisss Dec 2, 2024

bhargaveede Dec 3, 2024 •

edited

Loading

regisss Dec 3, 2024

regisss Dec 3, 2024

yeonsily left a comment

github-actions bot commented Dec 2, 2024

bhargaveede commented Dec 3, 2024

regisss commented Dec 3, 2024

Adding support for Context Parallelism using Deepseed's DistributedAt… #1501

Adding support for Context Parallelism using Deepseed's DistributedAt… #1501

Conversation

bhargaveede commented Nov 20, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Nov 20, 2024

bhargaveede commented Nov 21, 2024

regisss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhargaveede Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeonsily left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 2, 2024

bhargaveede commented Dec 3, 2024

regisss commented Dec 3, 2024

bhargaveede commented Nov 20, 2024 •

edited

Loading

bhargaveede Dec 3, 2024 •

edited

Loading