Why does vLLM use eager mode based execution for CPUs #10716
amd-lalithnc
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Looking through the code, I figured vLLM uses eager mode based execution for models.
I understand most ops are rewritten and replaced in the source model, are there any advantages or disadvantages to using eager mode over inductor path/torchscript path?
Beta Was this translation helpful? Give feedback.
All reactions