Support cross-attention key-value caches in rten-generate · robertknight/rten@0f82720

Commit

Support cross-attention key-value caches in rten-generate

Generalize the KV caching in rten-generate to support cross-attention KV-cache
inputs used by eg. Optimum-exported decoder models that are part of an
encoder-decoder system. In these models there are two kinds of KV-cache input,
self-attention KV-caches which are extended on each run and cross-attention
KV-caches which are computed on the first run and reused without modification in
subsequent runs.

Loading branch information

robertknight committed Aug 23, 2024

1 parent 36a1444 commit 0f82720

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `0f82720`

Commit

There are no files selected for viewing

0 comments on commit 0f82720

0 comments on commit `0f82720`