-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XLA:GPU]add sycl_gpu_runtime and implement it to py_clinet_gpu #15905
[XLA:GPU]add sycl_gpu_runtime and implement it to py_clinet_gpu #15905
Conversation
What this error means?: "ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this error means?: "ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information."
This is from JAX Linux CPU CI, right? It's just a CI infra error and is unrelated to this PR.
The XLA Linux CPU CI has related errors though. Other targets should be able to build, with or without --config=sycl
.
Thank you for your reply. I have added SYCL compilation constraints to the newly added compilation target. |
There are some specific HW code in the original PR, and are simplified now. Please help to review it again, thanks! |
@penpornk |
@penpornk Could you please take a look at this PR? |
@penpornk Could you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting my partial review from a while ago. Sorry for the delay!
Could you please also resolve conflicts?
We also have a stricter rules against #ifdef
s, so I'd like to hold this PR back until I have restructured existing SYCL code to be as ifdef-free as possible.
int count = 0; | ||
SYCLError_t error = SYCLGetDeviceCount(&count); | ||
if (error != SYCL_SUCCESS) { | ||
LOG(ERROR) << "Error to get the device count because " << ToString(error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Grammar
LOG(ERROR) << "Error to get the device count because " << ToString(error); | |
LOG(ERROR) << "Error getting the device count because " << ToString(error); |
auto device_list = platform.get_devices(); | ||
for (const auto& device : device_list) { | ||
if (device.is_gpu()) { | ||
root_devices.push_back(device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we append to devices
directly?
root_devices.push_back(device); | |
devices.push_back(device); |
size_t num_device = devices.size(); | ||
|
||
if (num_device <= 0) { | ||
LOG(ERROR) << "Can not found any devices."; | ||
} | ||
assert((num_device > 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We can just use CHECK
.
size_t num_device = devices.size(); | |
if (num_device <= 0) { | |
LOG(ERROR) << "Can not found any devices."; | |
} | |
assert((num_device > 0)); | |
CHECK(devices.size() > 0) << "Cannot find any SYCL GPUs"; |
if (ordinal > count) { | ||
return false; | ||
} | ||
|
||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This can be simplified to
if (ordinal > count) { | |
return false; | |
} | |
return true; | |
return (ordinal <= count); |
|
||
bool isValidDevice(int ordinal) { | ||
int count = 0; | ||
SYCLGetDeviceCount(&count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this number change during the runtime? If it doesn't change, maybe it's better to read just once and reuse the count
?
} | ||
|
||
static SYCLError_t getDevice(sycl::device** device, int device_ordinal) { | ||
// absl::ReaderMutexLock lock(&mu_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If this is not needed, please remove.
if (it != devices.end()) { | ||
*device_ordinal = it - devices.begin(); | ||
return SYCL_SUCCESS; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We don't need an else because the code inside if
already exited the function.
Since there hasn't been a response in 2 weeks and I think this PR should wait until our infrastructure is ready, I'll close the PR for now. Please feel free to continue the discussion here or reopen the PR when it's ready. |
No description provided.