Use Torch's current stream for ops #111

Yard1 · 2024-02-08T10:43:10Z

This PR makes PyTorch ops use the current Torch stream for kernel execution. This allows compatibility with Torch CUDA Graphs and allows the user to precisely set which stream to use in Python code using the canonical PyTorch API.

Note: I believe I have found all cases where the stream should be set, but I might have missed something.

yzh119

Thank you for doing this!

This PR fixes #113, which is because #69 changed the `BatchPrefillWithPagedKVCacheWrapperDispatched` signature, and `flashinfer_decl.h` was not updated accordingly. Also fixes some tiny format issues in #111.

Use Torch's current stream for ops

85474ab

Yard1 mentioned this pull request Feb 8, 2024

flashinfer paged attention vllm-project/vllm#2772

Closed

yzh119 approved these changes Feb 8, 2024

View reviewed changes

yzh119 merged commit 6c6c44a into flashinfer-ai:main Feb 8, 2024

Yard1 deleted the torch_cuda_stream branch February 8, 2024 12:38

yzh119 mentioned this pull request Feb 16, 2024

bugfix: fix the compilation issue of pip wheels #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Torch's current stream for ops #111

Use Torch's current stream for ops #111

Yard1 commented Feb 8, 2024

yzh119 left a comment

Use Torch's current stream for ops #111

Use Torch's current stream for ops #111

Conversation

Yard1 commented Feb 8, 2024

yzh119 left a comment

Choose a reason for hiding this comment