Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] HAP transform crashes when using a GPU #1047

Open
2 tasks done
burn2l opened this issue Feb 13, 2025 · 8 comments
Open
2 tasks done

[Bug] HAP transform crashes when using a GPU #1047

burn2l opened this issue Feb 13, 2025 · 8 comments
Labels
bug Something isn't working

Comments

@burn2l
Copy link

burn2l commented Feb 13, 2025

Search before asking

  • I searched the issues and found no similar issues.

Component

Transforms/Other

What happened + What you expected to happen

When the HAP transform is run on a machine with a GPU it crashes with:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Reproduction script

Run pytest test/test_hap.py on a machine with a GPU

Anything else

Can be fixed by changing line 41 in dpk_hap/transform.py to:

self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name_or_path).to(device)

OS

Red Hat Enterprise Linux (RHEL)

Python

3.11.x

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@burn2l burn2l added the bug Something isn't working label Feb 13, 2025
@shahrokhDaijavad
Copy link
Member

cc: @klwuibm

@klwuibm
Copy link

klwuibm commented Feb 14, 2025

I think we can add the following block at line 42:


if torch.cuda.is_available():
   device = torch.device(f"cuda")
   self.model.to(device)

@shahrokhDaijavad

@shahrokhDaijavad
Copy link
Member

Thank you, @klwuibm.

@ian-cho Can you please try what @klwuibm suggests? There is also issue #1048 and a suggested fix by @burn2l on that one, so if you feel comfortable, please submit a PR with these 2 changes. Thanks.
cc: @agoyal26 @touma-I

@burn2l
Copy link
Author

burn2l commented Feb 14, 2025

I tested the fix of always executing .to(device) and it worked for both CPU & GPU ... matching what is done for inputs on line 54.

@klwuibm
Copy link

klwuibm commented Feb 17, 2025

@burn2l Thanks very much for raising this issue. I believe that one can always execute model.to(device), if device is defined properly beforehand. Namely,

if torch.cuda.is_available():
   device = torch.device(f"cuda")
else:
   device = torch.device(f"cpu")
self.model.to(device)

cc: @shahrokhDaijavad @touma-I

@burn2l
Copy link
Author

burn2l commented Feb 17, 2025

The code already sets 'device' on line 24 and uses it on 'inputs' with .to(device) at line 54, so to be consistent .to(device) should be added to self.model. Having an if statement for just the model would be slightly confusing since it would suggest that one style is more appropriate than the other. A minor point but I think it would help readability.

@klwuibm
Copy link

klwuibm commented Feb 17, 2025

yes. @burn2l, you are absolutely right. The device already is set on line 24. So, we can safely add model.to(device).

@ian-cho
Copy link
Collaborator

ian-cho commented Feb 18, 2025

@ian-cho Can you please try what @klwuibm suggests? There is also issue #1048 and a suggested fix by @burn2l on that one, so if you feel comfortable, please submit a PR with these 2 changes. Thanks.

Thanks. I submitted a PR that added .to(device) to the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants