Script to easily train text generation models a la gpt-2-simple repo #39

timsoraro · 2020-02-18T14:12:24Z

Greetings! Your repository is a very welcomed contribution. I tried to follow the examples in this repo but faced some problems. Trying to modify the enwik8_simple I didn't understand how to:

Load my custom data into examples (I have a poetry dataset).
Generate output from a start prefix and until an end token.

Thanks a lot!

lucidrains · 2020-02-18T16:56:32Z

@timsoraro that's the most popular request! #33 I'll think about how I would do it

lucidrains · 2020-02-18T17:50:38Z

@timsoraro have you seen https://www.gwern.net/GPT-2

timsoraro · 2020-02-18T17:54:25Z

Indeed! I wanted to see how the performance of the Reformer compare with GPT-2. Later I would love to train it on PG-19. I want to comper Reformer's perplexity with that of Compressive Transformer's and to test its generation capabilities on such long text.

lucidrains · 2020-02-18T17:57:32Z

@timsoraro omg! we are on the same wavelength! yes, I am very excited about compressive transformer. I may integrate that concept with Reformer at some point, but I need to figure out how to do relative positional encoding in the context of LSH attention.

I think GPT-2 has a context length long enough for most poetry. Also, Reformer sadly doesn't have any pretrained models out yet, so there's nothing to fine-tune. Whatever you train from scratch is likely not as good as just fine-tuning a GPT-2 model (for now)

timsoraro · 2020-02-18T18:04:34Z

@timsoraro omg! we are on the same wavelength! yes, I am very excited about compressive transformer. I may integrate that concept with Reformer at some point, but I need to figure out how to do relative positional encoding in the context of LSH attention.

That would be incredible. I always wondered when we'd be able to train on such long sequences. It seems like we're just getting started.

I think GPT-2 has a context length long enough for most poetry. Also, Reformer sadly doesn't have any pretrained models out yet, so there's nothing to fine-tune. Whatever you train from scratch is likely not as good as just fine-tuning a GPT-2 model (for now)

True... Still worth giving a shot, imo. Also, we can make much bigger/deeper models with Reformer on the same hardware and sequence length, no? That's also exciting.

lucidrains · 2020-02-18T18:09:02Z

Yes, I was surprised that LSH could work, and there could be better algorithms out there yet to be explored https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee

Being totally transparent, but there may be an issue with multi-GPUs with the reversible net. It seems like there is no reversible neural net implementation without noted issues with memory. RobinBruegger/RevTorch#10 and silvandeleemput/memcnn#37 Both are ongoing, and could use all the eyes it can get.

lucidrains · 2020-02-18T18:09:50Z

But on a single GPU, if you are willing to wait longer, it seems to approximate full attention at impossible lengths!

timsoraro · 2020-02-18T18:26:59Z

Yes, I was surprised that LSH could work, and there could be better algorithms out there yet to be explored https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee

Very interesting! Thanks for sharing.

Being totally transparent, but there may be an issue with multi-GPUs with the reversible net. It seems like there is no reversible neural net implementation without noted issues with memory. RobinBruegger/RevTorch#10 and silvandeleemput/memcnn#37 Both are ongoing, and could use all the eyes it can get.

I'm not much proficient with DL myself (I have only a programming background), but solving the multi-GPUs problem is definitely necessary in order to train on big datasets like PG-19. But didn't #19 managed to solve this (apart from pytorch-lightning)?

But on a single GPU, if you are willing to wait longer, it seems to approximate full attention at impossible lengths!

:)

lucidrains · 2020-02-18T18:35:09Z

#19 solved an issue with using Apex for half precision as well as for allowing for more complicated architectures, where a reversible net undergoes multiple backward passes (encoder -> decoder), but it didn't solve the multi-GPU memory issue yet

timsoraro · 2020-02-18T19:33:50Z

Oh, okay (: 👍

lucidrains · 2020-03-01T21:48:53Z

@timsoraro hey I'm back, and I think I have a plan for how to abstract this cleanly! I was wondering, for your poetry dataset, how is each poem separated? Is it per separate file?

timsoraro · 2020-03-02T10:17:10Z

@lucidrains Hi! At first, I thought about formatting as gwern did here (albeit adding a start prefix), but I decided to train a conversational dataset formatted this way instead: https://pastebin.com/AYshKbgM.

What's your plan? (:

lucidrains · 2020-03-03T17:27:17Z

I was thinking of an interface like this -

from reformer_pytorch.generative_tools import Trainer
r = ReformerLM(**config)

trainer = Trainer(
    data_path='./folder',
    start_token='<|startoftext|>',
    end_token='<|endoftext|>',
    tokenizer_model_path = './model.model'  # will tokenize all the text and create the file if it does not exist
)

trainer.set_model(r)
trainer.train() # wait for some time

sample = trainer.generate('some initial text', 1024)

that should be one step removed away from a command line tool!

lucidrains · 2020-03-03T17:41:49Z

@timsoraro I'm also deliberating whether this should be in a separate repository, something like reformer-pytorch-textgen

timsoraro · 2020-03-03T23:48:20Z

Looks good to me! (: Will be the first to test.

I'm also deliberating whether this should be in a separate repository, something like reformer-pytorch-textgen

I'll let you decide on that one.

lucidrains · 2020-03-05T22:58:00Z

@timsoraro I got Microsoft's DeepSpeed working with Reformer! I've tested it with multiple GPUs on a local setup. One step closer to closing this issue!

zbloss · 2020-03-06T04:04:39Z

Fantastic!!!

timsoraro · 2020-04-14T16:19:02Z

I think you can close this one. BTW, any update if after the caching/other. improvements the performance of Reformer is on par with regular Transformer?

lucidrains · 2020-04-14T16:35:39Z

@timsoraro hello again! :) Reformer hasn't really worked out for me as well as I thought, so I moved on to another architecture that @AranKomat cued me in on. You should try it out at https://github.com/lucidrains/sinkhorn-transformer ! Currently, it needs to be always padded to the maximum sequence length, but I intend to fix that this week!

timsoraro · 2020-04-14T16:37:23Z

That's interesting! I'll check it out. Do you have an idea of how it performs on text?

lucidrains · 2020-04-14T17:16:52Z

If you mix it with local attention heads, it performs really well! I will try training it on the PG19 dataset soon

timsoraro · 2020-04-14T18:05:09Z

@lucidrains Very cool! I'm gonna give it a try and report back. Keep up the good work.

lucidrains added the enhancement New feature or request label Mar 6, 2020

timsoraro closed this as completed Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script to easily train text generation models a la gpt-2-simple repo #39

Script to easily train text generation models a la gpt-2-simple repo #39

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Mar 1, 2020

timsoraro commented Mar 2, 2020

lucidrains commented Mar 3, 2020 •

edited

Loading

lucidrains commented Mar 3, 2020 •

edited

Loading

timsoraro commented Mar 3, 2020

lucidrains commented Mar 5, 2020

zbloss commented Mar 6, 2020

timsoraro commented Apr 14, 2020

lucidrains commented Apr 14, 2020

timsoraro commented Apr 14, 2020

lucidrains commented Apr 14, 2020

timsoraro commented Apr 14, 2020

Script to easily train text generation models a la gpt-2-simple repo #39

Script to easily train text generation models a la gpt-2-simple repo #39

Comments

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Feb 18, 2020

timsoraro commented Feb 18, 2020

lucidrains commented Mar 1, 2020

timsoraro commented Mar 2, 2020

lucidrains commented Mar 3, 2020 • edited Loading

lucidrains commented Mar 3, 2020 • edited Loading

timsoraro commented Mar 3, 2020

lucidrains commented Mar 5, 2020

zbloss commented Mar 6, 2020

timsoraro commented Apr 14, 2020

lucidrains commented Apr 14, 2020

timsoraro commented Apr 14, 2020

lucidrains commented Apr 14, 2020

timsoraro commented Apr 14, 2020

lucidrains commented Mar 3, 2020 •

edited

Loading

lucidrains commented Mar 3, 2020 •

edited

Loading