Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable roberta embedding #786

Open
wants to merge 6 commits into
base: habana_main
Choose a base branch
from

Conversation

yeonsily
Copy link

@yeonsily yeonsily commented Feb 5, 2025

We set position_ids and input_ids as [batch_size, bucket_size] on hpu so need to modify the current roberta embedding forward function.

e.g. position_id on gpu
[0,1,2,3,4,5,6,7]
but position_id on hpu
[0,1,2,3,4,5,6,7,0,0,0,.....0,0] which size is 128 with padding in this case.

One thing I noticed is torch.equal() on hpu is not working properly and I have to run it on cpu.
That code is nothing but checking pre-condition but we will need to investigate it.

This PR has a dependency on #758

Update-
We found that the issue is from dry_run=True in hpu_graph. I think we should disable it.

From pt-integration team,

`disable_tensor_cache=True` is an experimental feature and should be used with caution. While some models can work with this flag directly, others need to pass the cache_tensors_list alongside disable_tensor_cache=True to ensure that certain tensors are not freed.
When `disable_tensor_cache=True` is enabled, it will free all tensors except user outputs and view tensors. However, certain tensors, such as in-place tensors, cannot be identified by hpugraph and will also be freed. To prevent this, these tensors must be specified in the cache_tensors_list.
Also, when `disable_tensor_cache=True` is enabled, dry_run is enabled by default.
when dry run is enabled, the intermediate values/SingleHpugraphs are not evaluated until the full graph is captured, so the necessary marksteps/evaluation happening internally will not update the values. eg: local_scalar_dense
So this can lead to accuracy issues.

Update2-
'disable_tensor_cache' is configurable by PT_HPUGRAPH_DISABLE_TENSOR_CACHE. I reverted my change and commented it to README.

@afierka-intel
Copy link

@yeonsily thank you for the PR.

Rebase on latest habana_main branch to fix known issues in CI. Then ask for review again.

Thank you in advance!

@yeonsily yeonsily force-pushed the dev/enable_roberta_embedding branch from 9751682 to 55aacad Compare February 20, 2025 17:38
@yeonsily
Copy link
Author

@yeonsily thank you for the PR.

Rebase on latest habana_main branch to fix known issues in CI. Then ask for review again.

Thank you in advance!

Done. Thank you!

@yeonsily
Copy link
Author

I see that it failed to run tests/lora/test_llama_hpu.py::test_llama_lora_1x with graph compile error.
But I tried the same test on 1.19 docker on local machine and it passed.
I'm not sure if that's some setup specific issue.

Other failed test are because of hccl or device acq error.

*args, **kwargs)
HpuModelAdapter(*args, **kwargs),
disable_tensor_cache=True,
dry_run=False) if htorch.utils.internal.is_lazy(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather a major change if we are setting dry run to False now, without giving any configurability. Let's talk offline on how that affects performance.

@yeonsily yeonsily force-pushed the dev/enable_roberta_embedding branch from 55aacad to 4366a22 Compare February 25, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants