-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable roberta embedding #786
base: habana_main
Are you sure you want to change the base?
Conversation
4053ca7
to
9751682
Compare
@yeonsily thank you for the PR. Rebase on latest habana_main branch to fix known issues in CI. Then ask for review again. Thank you in advance! |
9751682
to
55aacad
Compare
Done. Thank you! |
I see that it failed to run tests/lora/test_llama_hpu.py::test_llama_lora_1x with graph compile error. Other failed test are because of hccl or device acq error. |
vllm/worker/hpu_model_runner.py
Outdated
*args, **kwargs) | ||
HpuModelAdapter(*args, **kwargs), | ||
disable_tensor_cache=True, | ||
dry_run=False) if htorch.utils.internal.is_lazy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rather a major change if we are setting dry run to False now, without giving any configurability. Let's talk offline on how that affects performance.
55aacad
to
4366a22
Compare
We set position_ids and input_ids as [batch_size, bucket_size] on hpu so need to modify the current roberta embedding forward function.
e.g. position_id on gpu
[0,1,2,3,4,5,6,7]
but position_id on hpu
[0,1,2,3,4,5,6,7,0,0,0,.....0,0] which size is 128 with padding in this case.
One thing I noticed is torch.equal() on hpu is not working properly and I have to run it on cpu.
That code is nothing but checking pre-condition but we will need to investigate it.
This PR has a dependency on #758
Update-
We found that the issue is from dry_run=True in hpu_graph. I think we should disable it.
From pt-integration team,
Update2-
'disable_tensor_cache' is configurable by
PT_HPUGRAPH_DISABLE_TENSOR_CACHE
. I reverted my change and commented it to README.