Skip to content

Commit

Permalink
[Neuron][Kernel] NKI-based flash-attention kernel with paged KV cache (
Browse files Browse the repository at this point in the history
…vllm-project#11277)

Signed-off-by: Liangfu Chen <[email protected]>
Co-authored-by: Jiangfei Duan <[email protected]>
  • Loading branch information
2 people authored and tjtanaa committed Jan 28, 2025
1 parent 0ae8f3e commit 008891b
Show file tree
Hide file tree
Showing 3 changed files with 1,126 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .buildkite/run-neuron-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,4 @@ docker run --rm -it --device=/dev/neuron0 --device=/dev/neuron1 --network host \
-e "NEURON_COMPILE_CACHE_URL=${NEURON_COMPILE_CACHE_MOUNT}" \
--name "${container_name}" \
${image_name} \
/bin/bash -c "python3 /workspace/vllm/examples/offline_inference/neuron.py"
/bin/bash -c "python3 /workspace/vllm/examples/offline_inference/neuron.py && python3 -m pytest /workspace/vllm/tests/neuron/ -v --capture=tee-sys"
Loading

0 comments on commit 008891b

Please sign in to comment.