Release KV cache memory occupation when not needed #2588

fzyzcjy · 2024-12-26T07:06:15Z

Motivation

This PR depends on #2586, thus there are much more code in the diff panel than its actual content.

Please see #2542

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

merrymercy

The design makes sense. The only downside is that it is not compatible with cuda graph. Does you have any ideas to make it compatible with cuda graph? Can we try to hack into the pytorch memory allocator and reuse buffer or fix the cuda pointers?

fzyzcjy · 2024-12-26T23:39:20Z

@merrymercy Thank you! I posted in #2542 (comment)

fzyzcjy · 2024-12-27T12:20:09Z

I will temporarily close this since #2542 (comment) seems more promising :)

fzyzcjy added 16 commits December 26, 2024 13:16

empty struct

a5061cc

more

5a5651b

more

1ccf84c

simp

6e55282

more

5edcf5a

fix typing

35eb3ad

more

211550e

Merge branch 'feat/code_cleanup' into feat/memory_optimization

619aa19

more

95a8db9

more

5650a75

more

ecd3d9a

more

53573cc

fix

8f8bc3d

more

eaa9808

more

f3c948c

more

94e9ec8

fzyzcjy mentioned this pull request Dec 26, 2024

[Feature] Proposal: Releasing SGLang memory when idle #2583

Open

fzyzcjy and others added 4 commits December 26, 2024 15:22

more

2317150

Merge branch 'feat/code_cleanup' into feat/memory_optimization

bc56193

more

8042494

Merge branch 'main' into feat/memory_optimization

711b7de

merrymercy reviewed Dec 26, 2024

View reviewed changes

fzyzcjy mentioned this pull request Dec 26, 2024

[Feature] (Willing to PR) Avoid KV cache occupying GPU memory when not used #2542

Open

2 tasks

fzyzcjy closed this Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release KV cache memory occupation when not needed #2588

Release KV cache memory occupation when not needed #2588

fzyzcjy commented Dec 26, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading

fzyzcjy commented Dec 26, 2024

fzyzcjy commented Dec 27, 2024

Release KV cache memory occupation when not needed #2588

Release KV cache memory occupation when not needed #2588

Conversation

fzyzcjy commented Dec 26, 2024 • edited Loading

Motivation

Modifications

Checklist

merrymercy left a comment • edited Loading

Choose a reason for hiding this comment

fzyzcjy commented Dec 26, 2024

fzyzcjy commented Dec 27, 2024

fzyzcjy commented Dec 26, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading