[SLM] Batched Llama #1520

MasterJH5574 · 2023-12-31T04:52:15Z

This PR introduces the batched llama modeling with Paged KV cache in SLM flow.

cpp/llm_chat.cc

python/mlc_chat/chat_module.py

python/mlc_chat/cli/gen_config.py

python/mlc_chat/model/llama/llama_model.py

python/mlc_chat/model/tir_inventory.py

python/mlc_chat/model/llama/llama_model.py

python/mlc_chat/help.py

python/mlc_chat/interface/compile.py

python/mlc_chat/model/llama/llama_model.py

python/mlc_chat/model/utils/kv_cache.py

python/mlc_chat/model/llama/llama_model.py

junrushao

LGTM!

python/mlc_chat/model/llama/llama_model.py

This PR introduces the batched llama modeling with Paged KV cache in SLM flow.

PR #1520 introduces batched llama to SLM. This PR updates the serving codebase and fully switch the model definition to SLM flow. Note that with this PR merged in, the previous batching model definition flow becomes outdated and no longer usable.

PR #1520 introduces batched llama to SLM. This PR updates the serving codebase and fully switch the model definition to SLM flow. Most changes in this PR are trivial. Note that with this PR merged in, the previous batching model definition flow becomes outdated and no longer usable.

PR mlc-ai#1520 introduces batched llama to SLM. This PR updates the serving codebase and fully switch the model definition to SLM flow. Most changes in this PR are trivial. Note that with this PR merged in, the previous batching model definition flow becomes outdated and no longer usable.

PR #1520 introduces batched llama to SLM. This PR updates the serving codebase and fully switch the model definition to SLM flow. Most changes in this PR are trivial. Note that with this PR merged in, the previous batching model definition flow becomes outdated and no longer usable.

PR mlc-ai#1520 introduces batched llama to SLM. This PR updates the serving codebase and fully switch the model definition to SLM flow. Most changes in this PR are trivial. Note that with this PR merged in, the previous batching model definition flow becomes outdated and no longer usable.

PR #1520 introduces batched llama to SLM. This PR updates the serving codebase and fully switch the model definition to SLM flow. Most changes in this PR are trivial. Note that with this PR merged in, the previous batching model definition flow becomes outdated and no longer usable.

MasterJH5574 marked this pull request as draft December 31, 2023 04:52

MasterJH5574 force-pushed the 12-30-slm-batched-llama branch 2 times, most recently from 1e811f6 to 2295999 Compare December 31, 2023 15:23

junrushao reviewed Jan 1, 2024

View reviewed changes

cpp/llm_chat.cc Outdated Show resolved Hide resolved

junrushao reviewed Jan 1, 2024

View reviewed changes

python/mlc_chat/chat_module.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 1, 2024

View reviewed changes

python/mlc_chat/cli/gen_config.py Show resolved Hide resolved

junrushao reviewed Jan 1, 2024

View reviewed changes

python/mlc_chat/model/llama/llama_model.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 1, 2024

View reviewed changes

python/mlc_chat/model/tir_inventory.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 1, 2024

View reviewed changes

python/mlc_chat/model/llama/llama_model.py Outdated Show resolved Hide resolved

MasterJH5574 force-pushed the 12-30-slm-batched-llama branch 8 times, most recently from 090d6eb to 6eb18f4 Compare January 2, 2024 15:04

MasterJH5574 marked this pull request as ready for review January 2, 2024 15:18

MasterJH5574 force-pushed the 12-30-slm-batched-llama branch 6 times, most recently from 747cd93 to 6d6357f Compare January 2, 2024 21:01

junrushao reviewed Jan 3, 2024

View reviewed changes

python/mlc_chat/help.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 3, 2024

View reviewed changes

python/mlc_chat/interface/compile.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 3, 2024

View reviewed changes

python/mlc_chat/model/llama/llama_model.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 3, 2024

View reviewed changes

python/mlc_chat/model/llama/llama_model.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 3, 2024

View reviewed changes

python/mlc_chat/model/llama/llama_model.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 3, 2024

View reviewed changes

python/mlc_chat/model/utils/kv_cache.py Outdated Show resolved Hide resolved

MasterJH5574 force-pushed the 12-30-slm-batched-llama branch 2 times, most recently from e1cc139 to 08874b8 Compare January 4, 2024 04:08

junrushao reviewed Jan 4, 2024

View reviewed changes

python/mlc_chat/model/utils/kv_cache.py Outdated Show resolved Hide resolved

junrushao reviewed Jan 4, 2024

View reviewed changes

python/mlc_chat/model/llama/llama_model.py Outdated Show resolved Hide resolved

MasterJH5574 force-pushed the 12-30-slm-batched-llama branch 4 times, most recently from 319ce79 to 4f2a69f Compare January 4, 2024 18:35

junrushao approved these changes Jan 4, 2024

View reviewed changes

python/mlc_chat/model/llama/llama_model.py Outdated Show resolved Hide resolved

[SLM] Batched Llama

27cd520

This PR introduces the batched llama modeling with Paged KV cache in SLM flow.

MasterJH5574 force-pushed the 12-30-slm-batched-llama branch from 9292f62 to 27cd520 Compare January 4, 2024 18:42

tqchen approved these changes Jan 4, 2024

View reviewed changes

tqchen merged commit 7239a91 into mlc-ai:main Jan 4, 2024

MasterJH5574 mentioned this pull request Jan 4, 2024

[Serving] Switching to SLM model definition flow #1537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SLM] Batched Llama #1520

[SLM] Batched Llama #1520

MasterJH5574 commented Dec 31, 2023

junrushao left a comment

[SLM] Batched Llama #1520

[SLM] Batched Llama #1520

Conversation

MasterJH5574 commented Dec 31, 2023

junrushao left a comment

Choose a reason for hiding this comment