Simplify the cache decoding graph #780

mattdangerw · 2023-02-25T00:59:43Z

This updates our CachedMultiHeadAttention layer to avoid slicing the input to a dynamic length during generative decoding. Instead, our cache is always a fixed length, as are the keys and values after the cache is applied.

Overall this ends up being both a nice simplification and speedup. After patching in a fix for #779, generating 25 sequences of length 256 with the base model goes from 52s -> 33s.

keras_nlp/samplers/sampler.py

keras_nlp/layers/cached_multi_head_attention.py

chenmoneygithub

Thanks Matt! It works pretty well! Dropped a few comments on the code itself.

keras_nlp/layers/cached_multi_head_attention.py

keras_nlp/layers/transformer_decoder.py

keras_nlp/layers/transformer_decoder_test.py

keras_nlp/layers/transformer_layer_utils.py

keras_nlp/models/gpt2/gpt2_causal_lm.py

mattdangerw · 2023-03-01T19:32:03Z

Thanks for the review! Addressed all comments.

chenmoneygithub

Thanks Matt!

keras_nlp/layers/transformer_decoder_test.py

keras_nlp/models/gpt2/gpt2_causal_lm.py

mattdangerw requested a review from chenmoneygithub February 25, 2023 00:59

mattdangerw commented Feb 25, 2023

View reviewed changes

keras_nlp/samplers/sampler.py Show resolved Hide resolved

mattdangerw force-pushed the simplify-cache branch from 8cd63c6 to ced9248 Compare February 25, 2023 01:28

mattdangerw commented Feb 25, 2023

View reviewed changes

keras_nlp/layers/cached_multi_head_attention.py Show resolved Hide resolved

chenmoneygithub suggested changes Feb 28, 2023

View reviewed changes

mattdangerw force-pushed the simplify-cache branch from ced9248 to c554b8f Compare March 1, 2023 19:31

chenmoneygithub approved these changes Mar 2, 2023

View reviewed changes

keras_nlp/layers/transformer_decoder_test.py Outdated Show resolved Hide resolved

keras_nlp/models/gpt2/gpt2_causal_lm.py Show resolved Hide resolved

Simplify the cache decoding graph

c2dd044

mattdangerw force-pushed the simplify-cache branch from c554b8f to c2dd044 Compare March 3, 2023 00:30

mattdangerw merged commit 23f06c0 into keras-team:master Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify the cache decoding graph #780

Simplify the cache decoding graph #780

mattdangerw commented Feb 25, 2023

chenmoneygithub left a comment

mattdangerw commented Mar 1, 2023

chenmoneygithub left a comment

Simplify the cache decoding graph #780

Simplify the cache decoding graph #780

Conversation

mattdangerw commented Feb 25, 2023

chenmoneygithub left a comment

Choose a reason for hiding this comment

mattdangerw commented Mar 1, 2023

chenmoneygithub left a comment

Choose a reason for hiding this comment