Enable roberta embedding #786

yeonsily · 2025-02-05T20:33:10Z

We set position_ids and input_ids as [batch_size, bucket_size] on hpu so need to modify the current roberta embedding forward function.

e.g. position_id on gpu
[0,1,2,3,4,5,6,7]
but position_id on hpu
[0,1,2,3,4,5,6,7,0,0,0,.....0,0] which size is 128 with padding in this case.

One thing I noticed is torch.equal() on hpu is not working properly and I have to run it on cpu.
That code is nothing but checking pre-condition but we will need to investigate it.

This PR has a dependency on #758

Update-
We found that the issue is from dry_run=True in hpu_graph. I think we should disable it.

From pt-integration team,

`disable_tensor_cache=True` is an experimental feature and should be used with caution. While some models can work with this flag directly, others need to pass the cache_tensors_list alongside disable_tensor_cache=True to ensure that certain tensors are not freed.
When `disable_tensor_cache=True` is enabled, it will free all tensors except user outputs and view tensors. However, certain tensors, such as in-place tensors, cannot be identified by hpugraph and will also be freed. To prevent this, these tensors must be specified in the cache_tensors_list.
Also, when `disable_tensor_cache=True` is enabled, dry_run is enabled by default.
when dry run is enabled, the intermediate values/SingleHpugraphs are not evaluated until the full graph is captured, so the necessary marksteps/evaluation happening internally will not update the values. eg: local_scalar_dense
So this can lead to accuracy issues.

Update2-
'disable_tensor_cache' is configurable by PT_HPUGRAPH_DISABLE_TENSOR_CACHE. I reverted my change and commented it to README.

afierka-intel · 2025-02-20T14:26:41Z

@yeonsily thank you for the PR.

Rebase on latest habana_main branch to fix known issues in CI. Then ask for review again.

Thank you in advance!

yeonsily · 2025-02-20T17:41:10Z

@yeonsily thank you for the PR.

Rebase on latest habana_main branch to fix known issues in CI. Then ask for review again.

Thank you in advance!

Done. Thank you!

yeonsily · 2025-02-20T20:02:53Z

I see that it failed to run tests/lora/test_llama_hpu.py::test_llama_lora_1x with graph compile error.
But I tried the same test on 1.19 docker on local machine and it passed.
I'm not sure if that's some setup specific issue.

Other failed test are because of hccl or device acq error.

michalkuligowski · 2025-02-25T09:14:53Z

vllm/worker/hpu_model_runner.py

-            *args, **kwargs)
+            HpuModelAdapter(*args, **kwargs),
+            disable_tensor_cache=True,
+            dry_run=False) if htorch.utils.internal.is_lazy(


This is rather a major change if we are setting dry run to False now, without giving any configurability. Let's talk offline on how that affects performance.

hpu graph.

yeonsily requested a review from libinta February 5, 2025 20:33

yeonsily requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners February 5, 2025 20:33

yeonsily force-pushed the dev/enable_roberta_embedding branch 3 times, most recently from 4053ca7 to 9751682 Compare February 19, 2025 19:20

yeonsily force-pushed the dev/enable_roberta_embedding branch from 9751682 to 55aacad Compare February 20, 2025 17:38

michalkuligowski requested changes Feb 25, 2025

View reviewed changes

yeonsily added 6 commits February 25, 2025 19:08

Enable roberta embedding

ad4b58f

Update missing parameter

87b7db0

Fix code format

4bdc508

Move back torch.arange to hpu as the issue is gone with dry_run=False in

77d2e54

hpu graph.

Fix code format

2c53677

Revert dry_run change as it's configurable by environment variable.

4366a22

yeonsily force-pushed the dev/enable_roberta_embedding branch from 55aacad to 4366a22 Compare February 25, 2025 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable roberta embedding #786

Enable roberta embedding #786

yeonsily commented Feb 5, 2025 •

edited by github-actions bot

Loading

afierka-intel commented Feb 20, 2025

yeonsily commented Feb 20, 2025

yeonsily commented Feb 20, 2025

michalkuligowski Feb 25, 2025

Enable roberta embedding #786

Are you sure you want to change the base?

Enable roberta embedding #786

Conversation

yeonsily commented Feb 5, 2025 • edited by github-actions bot Loading

afierka-intel commented Feb 20, 2025

yeonsily commented Feb 20, 2025

yeonsily commented Feb 20, 2025

michalkuligowski Feb 25, 2025

Choose a reason for hiding this comment

yeonsily commented Feb 5, 2025 •

edited by github-actions bot

Loading