Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support cross-attention key-value caches in rten-generate
Generalize the KV caching in rten-generate to support cross-attention KV-cache inputs used by eg. Optimum-exported decoder models that are part of an encoder-decoder system. In these models there are two kinds of KV-cache input, self-attention KV-caches which are extended on each run and cross-attention KV-caches which are computed on the first run and reused without modification in subsequent runs.
- Loading branch information