[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels #12294

fenghuizhang · 2025-01-22T02:07:12Z

Pipe attn_logits_soft_cap through paged_attention, this will unblock some of our models' adoption.

Note that the changed code currently doesn't have unit tests. I will add one later.

Signed-off-by: Fenghui Zhang [email protected]

github-actions · 2025-01-22T02:07:25Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-redhat · 2025-01-22T03:34:22Z

Can you add a correctness test for gemma in the ci so this codepath is flexed? Thanks!

mgoin · 2025-01-22T16:58:23Z

Ditto, LGTM when we can add test case to CI

fenghuizhang · 2025-01-22T18:02:37Z

Thanks for the quick turnaround.

I think CI (https://buildkite.com/vllm/fastcheck/builds/12156#01948bc4-58cd-4863-9eca-e2ea098879f9) exposed another issue in pt/xla code, I sent pytorch/xla#8600 to fix it and will verify things there first.

Will come back to this PR afterwards.

fenghuizhang · 2025-01-23T19:30:51Z

Local tests with the latest pt/xla code pass now, we may need to update the nightly build label for CI to pass:

PYTHONPATH=/home/fhzhang/xla python3 examples/offline_inference/tpu.py

The error in https://buildkite.com/vllm/fastcheck/builds/12156#01948bc4-58cd-4863-9eca-e2ea098879f9 went away.

fenghuizhang changed the title ~~Pipe attn_logits_soft_cap through paged attention~~ [Kernel] Pipe attn_logits_soft_cap through paged attention Jan 22, 2025

fenghuizhang changed the title ~~[Kernel] Pipe attn_logits_soft_cap through paged attention~~ [Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels Jan 22, 2025

robertgshaw2-redhat approved these changes Jan 22, 2025

View reviewed changes

fenghuizhang mentioned this pull request Jan 22, 2025

Fix an issue when piping attn_logits_soft_cap through in vllm. pytorch/xla#8600

Merged

mergify bot added the ci/build label Jan 23, 2025

lsy323 mentioned this pull request Jan 24, 2025

[TPU][CI] Update torchxla version in requirement-tpu.txt #12422

Merged

fenghuizhang closed this Jan 27, 2025

fenghuizhang force-pushed the main branch from 72bef9c to 2bc3fbb Compare January 27, 2025 19:09

fenghuizhang mentioned this pull request Jan 27, 2025

[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels #12482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels #12294

[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels #12294

fenghuizhang commented Jan 22, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 22, 2025

robertgshaw2-redhat commented Jan 22, 2025

mgoin commented Jan 22, 2025

fenghuizhang commented Jan 22, 2025

fenghuizhang commented Jan 23, 2025

[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels #12294

[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels #12294

Conversation

fenghuizhang commented Jan 22, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 22, 2025

robertgshaw2-redhat commented Jan 22, 2025

mgoin commented Jan 22, 2025

fenghuizhang commented Jan 22, 2025

fenghuizhang commented Jan 23, 2025

fenghuizhang commented Jan 22, 2025 •

edited by github-actions bot

Loading