Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XLA:GPU]add sycl_gpu_runtime and implement it to py_clinet_gpu #15905

Closed

Conversation

mayuyuace
Copy link
Contributor

No description provided.

@penpornk penpornk self-requested a review August 9, 2024 09:10
@penpornk penpornk added the kokoro:force-run Forces CI to rerun label Aug 9, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Aug 9, 2024
@mayuyuace
Copy link
Contributor Author

What this error means?: "ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information."

Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this error means?: "ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information."

This is from JAX Linux CPU CI, right? It's just a CI infra error and is unrelated to this PR.

The XLA Linux CPU CI has related errors though. Other targets should be able to build, with or without --config=sycl.

@mayuyuace
Copy link
Contributor Author

What this error means?: "ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information."

This is from JAX Linux CPU CI, right? It's just a CI infra error and is unrelated to this PR.

The XLA Linux CPU CI has related errors though. Other targets should be able to build, with or without --config=sycl.

Thank you for your reply. I have added SYCL compilation constraints to the newly added compilation target.

@NaiyerRizz NaiyerRizz self-assigned this Aug 12, 2024
@Zantares
Copy link

There are some specific HW code in the original PR, and are simplified now. Please help to review it again, thanks!

@penpornk penpornk added the kokoro:force-run Forces CI to rerun label Sep 6, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Sep 9, 2024
@mayuyuace
Copy link
Contributor Author

@penpornk
I have updated the code to fix conflicts. Please review this PR again, thank you.

@dimitar-asenov
Copy link
Member

@penpornk Could you please take a look at this PR?

@dimitar-asenov
Copy link
Member

@penpornk Could you please review this PR?

Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting my partial review from a while ago. Sorry for the delay!
Could you please also resolve conflicts?

We also have a stricter rules against #ifdefs, so I'd like to hold this PR back until I have restructured existing SYCL code to be as ifdef-free as possible.

int count = 0;
SYCLError_t error = SYCLGetDeviceCount(&count);
if (error != SYCL_SUCCESS) {
LOG(ERROR) << "Error to get the device count because " << ToString(error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Grammar

Suggested change
LOG(ERROR) << "Error to get the device count because " << ToString(error);
LOG(ERROR) << "Error getting the device count because " << ToString(error);

auto device_list = platform.get_devices();
for (const auto& device : device_list) {
if (device.is_gpu()) {
root_devices.push_back(device);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we append to devices directly?

Suggested change
root_devices.push_back(device);
devices.push_back(device);

Comment on lines +121 to +126
size_t num_device = devices.size();

if (num_device <= 0) {
LOG(ERROR) << "Can not found any devices.";
}
assert((num_device > 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can just use CHECK.

Suggested change
size_t num_device = devices.size();
if (num_device <= 0) {
LOG(ERROR) << "Can not found any devices.";
}
assert((num_device > 0));
CHECK(devices.size() > 0) << "Cannot find any SYCL GPUs";

Comment on lines +52 to +56
if (ordinal > count) {
return false;
}

return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This can be simplified to

Suggested change
if (ordinal > count) {
return false;
}
return true;
return (ordinal <= count);


bool isValidDevice(int ordinal) {
int count = 0;
SYCLGetDeviceCount(&count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this number change during the runtime? If it doesn't change, maybe it's better to read just once and reuse the count?

}

static SYCLError_t getDevice(sycl::device** device, int device_ordinal) {
// absl::ReaderMutexLock lock(&mu_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If this is not needed, please remove.

if (it != devices.end()) {
*device_ordinal = it - devices.begin();
return SYCL_SUCCESS;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We don't need an else because the code inside if already exited the function.

@penpornk
Copy link
Member

Since there hasn't been a response in 2 weeks and I think this PR should wait until our infrastructure is ready, I'll close the PR for now. Please feel free to continue the discussion here or reopen the PR when it's ready.

@penpornk penpornk closed this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants