[Neuron][Kernel] NKI-based flash-attention kernel with paged KV cache (…

…vllm-project#11277) Signed-off-by: Liangfu Chen <[email protected]> Co-authored-by: Jiangfei Duan <[email protected]>
EmbeddedLLM · Jan 28, 2025 · 008891b · 008891b
1 parent 0ae8f3e
commit 008891b
Show file tree

Hide file tree

Showing 3 changed files with 1,126 additions and 1 deletion.
diff --git a/.buildkite/run-neuron-test.sh b/.buildkite/run-neuron-test.sh
@@ -54,4 +54,4 @@ docker run --rm -it --device=/dev/neuron0 --device=/dev/neuron1 --network host \
        -e "NEURON_COMPILE_CACHE_URL=${NEURON_COMPILE_CACHE_MOUNT}" \
        --name "${container_name}" \
        ${image_name} \
-       /bin/bash -c "python3 /workspace/vllm/examples/offline_inference/neuron.py"
+       /bin/bash -c "python3 /workspace/vllm/examples/offline_inference/neuron.py && python3 -m pytest /workspace/vllm/tests/neuron/ -v --capture=tee-sys"