Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Intel GPU integration #99

Merged
merged 6 commits into from
Jan 15, 2024
Merged

Conversation

Zantares
Copy link

@Zantares Zantares commented Nov 9, 2023

This RFC is from Intel, following XLA GPU roadmap to integrate Intel GPU as in-tree device in OpenXLA. Hope to get your suggestion for this proposal, thanks!

## Motivation
Intel has experimental released [Intel® Extension for OpenXLA*](https://github.com/intel/intel-extension-for-openxla) based on `PJRT C API` to support runing applications on Intel GPU when use OpenXLA,
but in-tree build is a better way to maximize the capabilities of OpenXLA and improve user experience,
Intel would like to upstream the related changes inside **Intel® Extension for OpenXLA*** to OpenXLA to make Intel GPU as in-tree build device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the tradeoffs you see here: why in-tree?
I would think that having a monolithic repository for XLA isn't scalable, or desirable. TensorFlow paid the price of this and tried for a long time to split up its component in multiple repos.

Have you considered having this component inside OpenXLA as a separate repo? It would be great if we could use this work to make XLA more modular and keep the reusable "core" of XLA independent of the various platform-specific instantiation of XLA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - probably important to coordinate with the Jax team on this. Afaik, they've been skating the other direction, even looking to have the NVIDIA XLA backend be built and installable separately.

Maybe there are two axes being conflated here: having the code be in-tree is a policy decision that the XLA authors can make with the usual implications on ifdef'ey stuff with disjoint dependencies that never builds all at once. I wouldn't make that decision personally, but doing so is a matter of source code/project organization that could be decided either way.

But in terms of deployment, it seems that this will always be a separately built and deployed binary with its own distinct dependencies and pinned to the PJRT API version. If this were some other way, I could see the attraction of moving in-tree since that would provide some privileged position for features that couldn't be accessed elsewhere. Since that is not the way things are going, though, everything I know of the xla dev process, CI, etc tells me that this would be better off as its own repo with its own CI and deployment lifecycle. The development experience will be significantly degraded by moving in-tree in my experience, so I would treat it as something to be done if there is no other way.

Personally, I would do what @hawkinsp recommends.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we do this job to follow the new XLA GPU roadmap. This simple reason may not answer your question completely, I will give more details later, thanks.

Copy link
Contributor

@stellaraccident stellaraccident Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That roadmap, which is very non technical, does seem to call out the desire to support these things as out of tree builds.

I would treat that roadmap more as marketing or product guidance vs a basis for technical requirements. It was not authored with an abundance of technical collaboration from most of the people who are the primary stakeholders on the things it claims to cover.

Copy link
Contributor

@joker-eph joker-eph Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I don't quite see how the link provides anything that supports the current proposal in this RFC, can you clarify what you are referring to?

We should decouple the interests of the project (OpenXLA), which definitely would want to grow support for more platforms, and the scalability of the development in the repo, which I'm concerned about right now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the PluggableDevice experience, I actually think that we need a modular maintainer's guidance on the technical direction. It is not practical to propose any structural changes if the modular maintainer is planning something else. As OpenXLA experienced multiple changes recently, it is even more important to get maintainer's opinion.

For the low-level design to identify good abstraction/interface (zero if-def), I agree that a new device backend has better visibility on what changes are needed. If the changes align with modular maintainer's vision, Intel could step up and co-own these modules and drive the changes.

Seeking a good abstraction doesn't mean Intel GPU backend likes to live out-of-tree. The out-of-tree integration like PJRT is valuable for plugin to mature and experiment new features. As a backend designed specifically for OpenXLA, pushing mature features upstream with a good interface is a better destiny. It helps Intel GPU more accessible to users and OpenXLA to cover more device targets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The out-of-tree integration like PJRT is valuable for plugin to mature and experiment new features. As a backend designed specifically for OpenXLA, pushing mature features upstream with a good interface is a better destiny. It helps Intel GPU more accessible to users and OpenXLA to cover more device targets.

I don't quite understand this part: why would living in the same repo changes anything?
In particular you wrote "OpenXLA to cover more device targets": it's not clear to me why being "in-tree" would relate in any way to what OpenXLA would support! As long as we organize our GitHub collection of repos together, I don't quite see the difference really: we would still be able to say "OpenXLA support Intel GPU" (as much as any other platform), and we should still be able to have JAX (for example) released with support for Intel GPU.
In-tree vs out-of-tree should really be orthogonal to that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my observation, a backend living "out-of-tree" has higher risk of developing features driven by short-term business needs but known hard to be upstreamed. The fact of living "out-of-tree" certainly impacts the "mind of share" of upstream module maintainer and how tightly the backend tracks the upstream changes. So from Intel side, there is strong motivation to get mature and compatible features upstreamed.

Although the RFC is namely for intel GPU, but the proposed change potentially benefits other devices. The RFC is to add spirv target and SYCL runtime to OpenXLA, and these are industry standard and used by other devices. From that perspective, the efforts help OpenXLA cover more device targets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating, much appreciated!

Considering individual "feature" as separate reusable components in-tree (like "spirv target" or "SYCL runtime") makes a lot of sense to me, I didn't grasp this from the RFC.

I assume contributions would be separate for each new component right?
Each would end up independent library/component in-tree, with its own tests and documentation, and that decouples the question of the full backend support which assembles these components in a coherent PJRT registered platform.

Copy link
Author

@Zantares Zantares Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification from @Jianhui-Li , I have added the purpose of supporting SPIRV target and SYCL runtime in RFC.

I assume contributions would be separate for each new component right?
Each would end up independent library/component in-tree, with its own tests and documentation, and that decouples the question of the full backend support which assembles these components in a coherent PJRT registered platform.

Yes, in original plan we already decided to upstream them separately if get approved. SPIRV target (means common changes mentioned above) will be upstreamed first after addressed #ifdef.

### Code integration & binary release
We would like to introduce a new macro `INTEL_GPU` (Tentative) for code integration:
```c++
#ifndef INTEL_GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems contradictory with the stated goal of having another "in-tree build device". The typical way that XLA differentiates between multiple devices is such that they represent distinct source code that can co-exist in the same binary. I understand the desire for carrying such ifdefs, but it does mean that this is not a truly independent device in the same way (for example) that the XLA CPU device is different from the XLA GPU and TPU devices (i.e. all three of which can be enabled simultaneously in the same build invocation).

I think this reinforces that the integration point here has to be at the PJRT plugin layer, and this plugin just happens to use common code from XLA.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because it's hard to integrate new device to OpenXLA GPU and make it co-existing with current in-tree GPU at once without any bugs... We just follow the experience of what we have done in TF before, oneDNN took a long time to become the default build of TF CPU. It also started from macro build like this.

Thanks for your question, I will update RFC to call this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current situation where you feel like you would have to be in-tree and have to use #ifdef looks more to me like a symptom of a problem to address rather than something that is intended, or that we should consider intangible.

I would really rather want to look at how to make it possible for platforms to be developed independently (by improving the pluggability/modularity in XLA) rather than gluing all of it monolithically with #ifdef everywhere.

Google already develops the entire support for the TPU platform out-of-tree and seems to be successful at doing so, so it should be in-reach. I'm curious what's missing for the Intel platform or what makes it so different than TPU that it couldn't be out-of-tree?

@Zantares
Copy link
Author

Hi all, thanks for all of your discussion, I'm glad to see so many suggestions!

Here are some updates/plans for future work after simple talked with Google:

  • We agree to eliminate the #ifdef in code and upstream the common changes first, since it's very simple and can help to parameterize pass API to support more pluggable device in future
  • Still need more discussion for runtime part. We can continue to discuss it in this RFC, or wait to see if the maintainer can give more info
  • We have set public CI demo in public repo Intel® Extension for OpenXLA*, and plan to hook it to OpenXLA CI for any possible testing. Welcome to check it!

@penpornk
Copy link
Member

penpornk commented Dec 1, 2023

Hi all,

Sorry for the delayed response! I somehow missed that the RFC was opened until now. Below is what I understood from the discussion:

Intel's motivations for upstreaming some parts of the Intel GPU PJRT plug-in:

  1. Integrating at PJRT level has no easy way to reuse XLA HLO optimization passes.
    • Intel has to hard-copy 90% of the code into the plug-in's repo and periodically resync. The actual changes to the passes are minor and are not worth this much maintenance effort.
  2. Some code isn't specific to Intel GPU. Upstreaming them would benefit more devices, i.e., SYCL and SPIR-V devices.
  3. Out of sight, out of mind. According to Intel:
    1. Users may be more aware of Intel GPU support if it is in-tree.
      • I think this is orthogonal. We could reach similar user-awareness if Intel GPU support is listed in XLA's main page/documentations/blogs, etc.
    2. In-tree developers may be more likely to factor in Intel GPU programming model in their new feature development/design if the code is in-tree.
      • This seems addressable with a good shared interface, which is orthogonal to where the code lives.
    3. On the other hand, it's also easy for plug-in code to develop things that don't align to in-tree code in a long-term in favor of faster development velocity.

So far, we seem to all agree that:

  1. #ifdefs interspersed in code is bad and should be avoided as much as possible.
    • As @stellaraccident pointed out, it's also detrimental to motivation 3.2, as it makes the code diverge and has nothing to enforce developers to think of the shared infrastructure. Having a good shared abstraction/interface is beneficial to motivation 3.2.
  2. XLA:GPU needs to be more modularized and have more integration points.
  3. It makes sense to upstream device-generic code.

There are also different levels of in-tree-ness/modularity that should be distinguished:

  • T1: In-tree, code interspersed with #ifdefs: A resounding no from everyone.
  • T2: In-tree, mostly modularized backend code:
    • Separate files implementing/extending a shared interface, with the module registration happening when build with some Bazel config flag.
  • T3: Out-of-tree as a separate repo, still a single binary:
    • XLA depends on this repo as a Bazel http_repository.
    • Builds as a single XLA binary with some Bazel config flag.
  • T4: Out-of-tree as a separate repo, separate plug-in binary to be installed alongside XLA:
    • This is what the current Intel GPU PJRT plug-in does.

If I understand correctly, Intel proposes upstreaming 3 separate components T2-style, and will have fast-moving, Intel-specific features (not mentioned in this RFC) remain in the plug-in (T4). @joker-eph initially proposed the 3 components should be done T3-style, but later on seemed to be okay with T2 as well, did I get this right?

Let's decide on each component separately?

  1. LLVM IR changes
    1. Integrate SPIR-V translator: I think this should be upstreamed since it benefits any SPIR-V-supported devices.
      • @jpienaar do you have a requirement for it to be a pluggable pass at the end?
    2. The rest of the changes seem specific to Intel GPU and we'll likely need to look at the code to see how we can make it more parameterized/modularized.
  2. Lib call: The lib call rewrites happen in HLO optimization passes.
    • As mentioned in our CPU Strategy, we are defining a formal integration point for 3rd-party HLO/MLIR-based graph rewrite/optmization passes. The same work applies to XLA:GPU. We would also like to learn what improvements we can make to existing passes to make them more reusable, and if we need to add more extension points (and where).
    • @Zantares Could you please give some code pointers to the changes you made to the existing HLO passes in your plug-in repo? We could schedule a public meeting call to walk through the code as well if it's easier.
  3. XLA:GPU Runtime: For calling SYCL-supported devices.
    • Based on the device-generic principle, it seems like it should be upstreamed, so other devices can build upon this interface.
    • 3,000 lines of code is nontrivial, but if it is in a separate backend folder that should be modular enough.
    • One concern is about maintenance, as Google doesn't have the expertise to maintain it. TensorFlow used to have in-tree SYCL support but we deleted it [issue, commit] because we weren't aware of users, and there was no activities on that code in a while.
    • I don't have a preference between out-of-tree (T3/T4) vs in-tree (T2) and delete if things are inactive -- I'm assuming Intel will help maintain the backend. What do people think?

@joker-eph
Copy link
Contributor

Thanks for the great summary @penpornk!

@joker-eph initially proposed the 3 components should be done T3-style, but later on seemed to be okay with T2 as well, did I get this right?

I think there is a nuance :)
To me all the shared components / library probably makes sense in-tree: that is T2 (even though: things could also be modular enough to be their own repo as well: imagine the SPIRV runtime in its own repo, with clean enough API that any SPIRV platform could just reuse it directly? The drawback is that a constellation of small components can become harder to manage/integrate/evolve).

On the other hand for the "non shared components", it's not clear to me why T2 is the best, it seems worth evaluating more closely if every platform could have its own repo for the non-shared code (without duplicating code: that would defeat the point of course), that is T4 I think? (which is how the TPU compiler is setup somehow).

Now: it is possible that 90% of what Intel has here is in the "T2" category and the T4 part is very small, or it is possible that it's the opposite, or anywhere in the middle. I haven't tried to make an assessment of this. What seemed valuable was to start upstreaming the things that are clearly "T2 component" cleanly, and then see where we are?

@stellaraccident
Copy link
Contributor

stellaraccident commented Dec 2, 2023

  • I don't have a preference between out-of-tree (T3/T4) vs in-tree (T2) and delete if things are inactive -- I'm assuming Intel will help maintain the backend. What do people think?

I think that for non NVIDIA GPU targets, it is a very non abstract (and primarily, non technically driven) desire to be in tree: in my experience, core developers for this kind of system -- which has grown in a non modular fashion against one target, benefit from a tangible itch to scratch. Putting the burden on those who come second or third to take the step to live outside of the peripheral vision of the primary authors is unfair and will not produce a good result with the style of development that the core authors do on this component. Case in point: it was three weeks before someone affiliated with primary stewardship of the project commented (and thank you for doing so, Penporn -- much appreciated). With that kind of dynamic, code locality is the only lever available to bias towards equal access to the platform, and the option to develop in tree should be retained.

I think that the GPU vendors should be colocated at this point: if one is in tree, then they all should be as a matter of policy. That can be reevaluated if the primary authors publish a more detailed plan and level of support track record for this kind of situation. It would be different if this were the fourth or fifth implementation and/or there was a strong track record of multi targeting for the GPU component specifically. But in effect, this is the second. The only way to evolve the situation from this point is colocation, imo.

@Zantares
Copy link
Author

Zantares commented Dec 4, 2023

Thanks for the summary @penpornk ! We will have some internal discussion first and give the feedback later.

@penpornk
Copy link
Member

penpornk commented Dec 4, 2023

Thank you all for the quick replies! :)
The tally so far:

Proposal Shared code (multi-vendor) Vendor-specific code
Intel T2 T2
@joker-eph T2 (T3 if/when makes sense) T4
@stellaraccident T2 T2

I also got feedback from JAX/PJRT. The code structure underneath doesn't matter to them, they only require that the device support for JAX still be built and packaged as a separate PJRT plug-in (even if the code lives in the same XLA tree).

Our team will look more into the Intel GPU plug-in code before making per-component suggestions too. The goal is to still keep XLA in good health with reasonable work scopes that fits each party's high-level goals and timelines.

Re: @joker-eph:

Now: it is possible that 90% of what Intel has here is in the "T2" category and the T4 part is very small, or it is possible that it's the opposite, or anywhere in the middle. I haven't tried to make an assessment of this. What seemed valuable was to start upstreaming the things that are clearly "T2 component" cleanly, and then see where we are?

Great question. @Zantares, do you have a quick answer here?
Starting with clearly-T2 things makes sense to me. Maybe we can start with the LLVM-SPIRV translator PR?

@joker-eph Do you consider the SYCL runtime support "clearly T2"?

Re: @stellaraccident:

Putting the burden on those who come second or third to take the step to live outside of the peripheral vision of the primary authors is unfair and will not produce a good result with the style of development that the core authors do on this component.

I agree. Device vendors may help with core code however they can out of goodwill / necessity to make things happen, but it's not their responsibility to do so. Core maintainers need to be actively involved.

Also +1 to @Jianhui-Li's point:

From the PluggableDevice experience, I actually think that we need a modular maintainer's guidance on the technical direction. It is not practical to propose any structural changes if the modular maintainer is planning something else.

While Intel contributed most of the PluggableDevice design and implementation, it was done in close collaboration with TF core maintainers (Google). Some examples:

  • Google suggested targeting classic TF runtime instead of TFRT because of time constraints and code maturity.
  • Google decided not to rush a new, generic device C API and proposed the StreamExecutor C API RFC as a short-term solution.
  • Google implemented the StreamExecutor C API [1, 2, 3, etc.] which enabled Intel to start prototyping PluggableDevice.
  • Google consistently made design suggestions for Intel's PluggableDevice and Grappler C API RFCs.
  • Google refactored sharable accelerator code from //tensorflow/core/common_runtime/gpu to a generic //tensorflow/core/common_runtime/device [1, 2, 3, 4, etc] for the new pluggable device type to use.
  • etc.

We apologize for the delay in responding to this RFC and aim to have better response times from now on.

Re: @stellaraccident:

With that kind of dynamic, code locality is the only lever available to bias towards equal access to the platform, and the option to develop in tree should be retained.

I think that the GPU vendors should be colocated at this point: if one is in tree, then they all should be as a matter of policy. That can be reevaluated if the primary authors publish a more detailed plan and level of support track record for this kind of situation. It would be different if this were the fourth or fifth implementation and/or there was a strong track record of multi targeting for the GPU component specifically. But in effect, this is the second. The only way to evolve the situation from this point is colocation, imo.

Great point here! I'll bring this up to the team.

@ezhulenev
Copy link
Member

Regarding Intel GPU support in StreamExecutor, I'd love it to be in tree as stream_executor/sycl (in addition to cuda and rocm) with minimal #ifdefs for Intel specific parts. Today we don't run any tests for rocm platform, and rely on AMD to run them and send patches, we can use the same approach for sycl (until we figure out how to do CI as a part of OpenXLA checks)

Also StreamExecutor today is really a hardware abstraction layer for XLA, so we are considering renaming the namespace to xla::hal and moving to hal folder, and do long overdue clean up. We don't have plans to replace it, we'll continue our investment in it.

@joker-eph
Copy link
Contributor

@joker-eph Do you consider the SYCL runtime support "clearly T2"?

Seems like something that other SYCL vendors would want instead of being a specific component of the IntelGPU backend right? If so I would say yes here.

While Intel contributed most of the PluggableDevice design and implementation, it was done in close collaboration with TF core maintainers (Google).

Absolutely, and actually I alluded to this work (without the links you have!) before in my one of my comments as an example of a "success story". It has to be collaborative, but it still implies "work" from the "new platforms" to help build their extension points.

@Zantares
Copy link
Author

Zantares commented Dec 5, 2023

@penpornk @joker-eph Quick answer for your question:

Now: it is possible that 90% of what Intel has here is in the "T2" category and the T4 part is very small, or it is possible that it's the opposite, or anywhere in the middle. I haven't tried to make an assessment of this. What seemed valuable was to start upstreaming the things that are clearly "T2 component" cleanly, and then see where we are?

It belongs to the 1st situation for SPIRV target part. We only have a few code changes here (~250 LoC).

LLVM IR changes
i. Integrate SPIR-V translator: I think this should be upstreamed since it benefits any SPIR-V-supported devices.
@jpienaar do you have a requirement for it to be a pluggable pass at the end?
ii. The rest of the changes seem specific to Intel GPU and we'll likely need to look at the code to see how we can make it more parameterized/modularized.

In addition, there's chance to be parameterized for the rest changes in my thought:

  1. Use generic SPIRV instrinsic instead
  2. Add device check (or new PJRT API) to distinguish address index for different device instead of hard code if possible

We can discuss it after preparing the code example, thanks!

@cheshire
Copy link

@Zantares There is a number of backend-specific contributions Intel could provide, is it possible to elaborate on planned changes?

There are the following areas which might require changes:

  1. Outlining sections of HLO instructions into Intel library calls, and plumbing Intel library calls into SE
  2. Modifying pre-Triton codegen to support Intel intrinsics
  3. Modifying Triton itself/Triton tiling logic to support Intel GPUs
  4. Modifying set of fusion heuristics, to perform different logic for Intel if necessary.

Is the set of planned changes mainly based around (1)? Is it possible to see the preview of features-to-land?

@Zantares
Copy link
Author

Lib call: The lib call rewrites happen in HLO optimization passes.

  • As mentioned in our CPU Strategy, we are defining a formal integration point for 3rd-party HLO/MLIR-based graph rewrite/optmization passes. The same work applies to XLA:GPU. We would also like to learn what improvements we can make to existing passes to make them more reusable, and if we need to add more extension points (and where).
  • @Zantares Could you please give some code pointers to the changes you made to the existing HLO passes in your plug-in repo? We could schedule a public meeting call to walk through the code as well if it's easier.

To @penpornk , sorry I have missed this point before. Please check below link to see how we use oneDNN lib call in plug-in. We also started to check CPU RFC to see how to reuse the 3rd-party passes to complete this work.

There are the following areas which might require changes:

  1. Outlining sections of HLO instructions into Intel library calls, and plumbing Intel library calls into SE
  2. Modifying pre-Triton codegen to support Intel intrinsics
  3. Modifying Triton itself/Triton tiling logic to support Intel GPUs
  4. Modifying set of fusion heuristics, to perform different logic for Intel if necessary.

Is the set of planned changes mainly based around (1)? Is it possible to see the preview of features-to-land?

To @cheshire , I assume Q1 is already answered above. For Q2 & Q3, we have already made some experiments to support Triton in our plug-in, it still needs lots of work. For Q4, right now in plug-in we reused most of public code and use macro to add specific fusion, in future we want to combine this work with new proposed Global Cost feature in OpenXLA.
In general, Q1 & Q4 are based around (1), you can check plug-in first to see the preview: https://github.com/intel/intel-extension-for-openxla. The Triton related part is another story which is not covered in this RFC, but I can say we really have plan to support Triton for Intel GPU.

@Zantares
Copy link
Author

Regarding Intel GPU support in StreamExecutor, I'd love it to be in tree as stream_executor/sycl (in addition to cuda and rocm) with minimal #ifdefs for Intel specific parts. Today we don't run any tests for rocm platform, and rely on AMD to run them and send patches, we can use the same approach for sycl (until we figure out how to do CI as a part of OpenXLA checks)

Hi @ezhulenev , does it mean that OpenXLA repo will send the PR patch to AMD CI and get its feedback? Is this detail available in public? We also want to provide similar check here.

@ezhulenev
Copy link
Member

No, currently it works as a post-submit, CI jobs runs periodically with latest commit and then we accept patches. @ddunl might know if we have any plans for pre-submit CI jobs for AMD, and do we have plans for supporting other CI jobs. I'd love to have an options to include backend-specific CI jobs optionally for a PR. Not sure about enabling it by default, as even with current set of jobs it takes a long time to get back results.

@Zantares
Copy link
Author

No, currently it works as a post-submit, CI jobs runs periodically with latest commit and then we accept patches. @ddunl might know if we have any plans for pre-submit CI jobs for AMD, and do we have plans for supporting other CI jobs. I'd love to have an options to include backend-specific CI jobs optionally for a PR. Not sure about enabling it by default, as even with current set of jobs it takes a long time to get back results.

Thanks for the feedback! It's a good idea that take backend-specific CI as post-submit to avoid possible blocking situations. We'll start from this. And we agree that not enabling backend-specific CI jobs by default. Maybe a better way is to trigger it by GitHub label/specific comment on demand.

@ddunl
Copy link
Member

ddunl commented Dec 19, 2023

I think for now we don't have specific plans for supporting external CI in general, just that we know we want to do it. Definitely backend-specific CI with the ability to trigger on demand is great though, and would be very helpful

@Zantares
Copy link
Author

Hi all, the 1st PR of SPRI-V target is submitted: openxla/xla#7940.

Copy link
Member

@jpienaar jpienaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping and submitting, already in progress so addressing missed step.

@jpienaar jpienaar merged commit 6e709d2 into openxla:main Jan 15, 2024
@Zantares
Copy link
Author

Zantares commented Feb 1, 2024

The SYCL runtime is submitted: openxla/xla#9042

Note this big PR is for preview and CI test, we will continue to split it into small PRs and welcome your suggestions for this PR. Please refer to comment openxla/xla#9042 (comment) for more details

@Zantares Zantares deleted the tenglu/intel_gpu_rfc branch February 1, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants