Updated Reformer to use caching during generation #8252

guillaume-be · 2020-11-03T09:30:00Z

What does this PR do?

The current reformer implementation supports caching of buckets and states, but this is not used during generation. Running a generation example in debugging mode, such as

from transformers import ReformerModelWithLMHead, ReformerTokenizer

model = ReformerModelWithLMHead.from_pretrained("google/reformer-crime-and-punishment").cuda()
tok = ReformerTokenizer.from_pretrained("google/reformer-crime-and-punishment")
output = tok.decode(
    model.generate(tok.encode("Notwithstanding", return_tensors="pt").cuda(),
                   do_sample=True,
                   temperature=0.7,
                   max_length=100,
                   use_cache=True)[0])

One can see that the past_buckets_states passed to the attention are always None (at

transformers/src/transformers/modeling_reformer.py

Line 365 in 504ff7b

past_buckets_states=None,

)

This is because the name of the past states for the reformer are neither past_key_values or mems.
This PR adds the name of the past states to the generation past allocation.

Generally, it may make sense to harmonize the past value for all models, so that the generate function generalizes better

Who can review?

Text Generation: @patrickvonplaten, @TevenLeScao
Reformer: @patrickvonplaten

patrickvonplaten · 2020-11-03T11:33:16Z

Great catch!

patrickvonplaten · 2020-11-03T11:33:45Z

Let's merge that quickly so that I can integrate it into https://github.com/huggingface/transformers/pull/6949/files#diff-b7601d397d5d60326ce61a9c91beaa2afa026014141052b32b07e1d044fbbe17

patrickvonplaten · 2020-11-03T11:40:09Z

Actually, we would have to add in two spots of this generate version. Considering that we will merge the big generate refactor today, I just added your fix quickly here: 12b54ec

Mentioned your PR at the fix - hope it's ok for you to close this PR to avoid any more merge conflicts.

Thanks a lot!

Added past_buckets_states to possible output cached states

339f967

guillaume-be changed the title ~~Added past_buckets_states to possible output cached states~~ Updated Reformer to use caching during generation Nov 3, 2020

patrickvonplaten mentioned this pull request Nov 3, 2020

Refactoring the generate() function #6949

Merged

7 tasks

patrickvonplaten closed this Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated Reformer to use caching during generation #8252

Updated Reformer to use caching during generation #8252

guillaume-be commented Nov 3, 2020

patrickvonplaten commented Nov 3, 2020

patrickvonplaten commented Nov 3, 2020

patrickvonplaten commented Nov 3, 2020 •

edited

Loading

Updated Reformer to use caching during generation #8252

Updated Reformer to use caching during generation #8252

Conversation

guillaume-be commented Nov 3, 2020

What does this PR do?

Who can review?

patrickvonplaten commented Nov 3, 2020

patrickvonplaten commented Nov 3, 2020

patrickvonplaten commented Nov 3, 2020 • edited Loading

patrickvonplaten commented Nov 3, 2020 •

edited

Loading