Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neuron PJRT Plugin enforces struct arguments struct size equality, breaking compatibility conventions #1095

Open
cerisier opened this issue Jan 24, 2025 · 2 comments

Comments

@cerisier
Copy link

This issue concerns the PJRT plugin that we (@zml) are loading directly.

Issue

The Neuron PJRT Plugin enforces strict equality checks on the size of the argument structs. This breaks the compatibility convention outlined in StableHLO ABI versioning and compatibility, where argument struct sizes should allow for backward-compatible extensions by permitting the struct size to be greater than or equal to the expected size.

This behavior also diverges from the approach followed in XLA, where the implementation accommodates structs with additional fields, as seen here.

When attempting to serialize using StableHLO 1.7.8, the Neuron PJRT Plugin fails due to the strict struct size equality check:

info(pjrt): Loaded library: libpjrt_neuron.so
error(pjrt): [PJRT_Client_Create] wrapper: error condition !(args->struct_size == ss):
@aws-bowencc
Copy link
Contributor

Hi @cerisier , thx for raising the issue. Based on the feedback from the team:

It is true that the Neuron PJRT plugin currently has a stricter compatibility check. The reason is https://docs.google.com/document/d/1TKB5NyGtdzrpgw5mpyFjVAhJjpSNdF31T6pjPl_UT2o/ is currently a guideline only and not enforced. For example, OpenXLA commit e28e6e9a2ea modified the PJRT_Client_BufferFromHostBuffer_Args struct by introducing a new memory field
in the middle of the struct definition, which isn't forward proof and would have introduced strange plugin behaviors if we didn't have the current strict compatibility check.

Our current understanding is we'll have to wait until the PJRT C-API version to become 1.x to see the guideline be fully established under the semantic versioning assumption, and unfortunately the Neuron PJRT plugin currently has to support a number of legacy PJRT C-API versions down to 0.12 - 0.23 range.

It looks like you are working with a PJRT C-API version that is higher than what we currently support. Calling libneuronxla.supported_clients() should provide a better sense on the current supported PJRT C-API versions.

Let us know if this would help.

@steeve
Copy link

steeve commented Jan 30, 2025

Hi @aws-bowencc, thanks for getting back to us.
I don't recall OpenXLA introducing fields in the middle of structs since we started using it a year ago. For instance, openxla/xla@e28e6e9a2ea is from August 2023. Right now, all platforms have moved away from jaxlib to PJRT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants