Skip to content

Commit

Permalink
Support cross-attention key-value caches in rten-generate
Browse files Browse the repository at this point in the history
Generalize the KV caching in rten-generate to support cross-attention KV-cache
inputs used by eg. Optimum-exported decoder models that are part of an
encoder-decoder system. In these models there are two kinds of KV-cache input,
self-attention KV-caches which are extended on each run and cross-attention
KV-caches which are computed on the first run and reused without modification in
subsequent runs.
  • Loading branch information
robertknight committed Aug 23, 2024
1 parent 36a1444 commit 0f82720
Showing 1 changed file with 251 additions and 75 deletions.
Loading

0 comments on commit 0f82720

Please sign in to comment.