Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Torch's current stream for ops #111

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

Yard1
Copy link
Contributor

@Yard1 Yard1 commented Feb 8, 2024

This PR makes PyTorch ops use the current Torch stream for kernel execution. This allows compatibility with Torch CUDA Graphs and allows the user to precisely set which stream to use in Python code using the canonical PyTorch API.

Note: I believe I have found all cases where the stream should be set, but I might have missed something.

Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this!

@yzh119 yzh119 merged commit 6c6c44a into flashinfer-ai:main Feb 8, 2024
@Yard1 Yard1 deleted the torch_cuda_stream branch February 8, 2024 12:38
yzh119 added a commit that referenced this pull request Feb 16, 2024
This PR fixes #113, which is because #69 changed the
`BatchPrefillWithPagedKVCacheWrapperDispatched` signature, and
`flashinfer_decl.h` was not updated accordingly.

Also fixes some tiny format issues in #111.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants