-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cache): StaticCache uses index_copy_ to avoid useless copy #31857
Changes from all commits
d329ad2
53e99d1
8e622ec
aba28b5
02608dd
4c818f3
a76cfb7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -862,8 +862,18 @@ def update( | |
k_out.copy_(key_states) | ||
v_out.copy_(value_states) | ||
else: | ||
k_out[:, :, cache_position] = key_states | ||
v_out[:, :, cache_position] = value_states | ||
# Note: here we use `tensor.index_copy_(dim, index, tensor)` that is equivalent to | ||
# `tensor[:, :, index] = tensor`, but the first one is compile-friendly and it does explicitly an in-place | ||
# operation, that avoids copies and uses less memory. | ||
try: | ||
# If using several devices (e.g.: multiple GPUs), we need to ensure everything is on the same one | ||
cache_position.to(device=k_out.device) | ||
k_out.index_copy_(2, cache_position, key_states) | ||
v_out.index_copy_(2, cache_position, value_states) | ||
except NotImplementedError: | ||
# The operator 'aten::index_copy.out' is not currently implemented for the MPS device. | ||
k_out[:, :, cache_position] = key_states | ||
v_out[:, :, cache_position] = value_states | ||
|
||
return k_out, v_out | ||
|
||
|
@@ -958,8 +968,14 @@ def update( | |
k_out = k_out[:, :, indices] | ||
v_out = v_out[:, :, indices] | ||
|
||
k_out[:, :, cache_position] = key_states | ||
v_out[:, :, cache_position] = value_states | ||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in the final commit, ensure that the two sets of changes are the same (in the current commit, this one is missing some logic) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are right, I forgot it! |
||
cache_position.to(device=k_out.device) | ||
k_out.index_copy_(2, cache_position, key_states) | ||
v_out.index_copy_(2, cache_position, value_states) | ||
except NotImplementedError: | ||
# The operator 'aten::index_copy.out' is not currently implemented for the MPS device. | ||
k_out[:, :, cache_position] = key_states | ||
v_out[:, :, cache_position] = value_states | ||
|
||
# `_.zero()` followed by `+=` is equivalent `=`, but compile-friendly (without graph breaks due to assignment) | ||
self.key_cache[layer_idx].zero_() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope this is not slowing anything down!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice anything when trying it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then good to go once check quality is green!