Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to easily train text generation models a la gpt-2-simple repo #39

Closed
timsoraro opened this issue Feb 18, 2020 · 22 comments
Closed
Labels
enhancement New feature or request

Comments

@timsoraro
Copy link

Greetings! Your repository is a very welcomed contribution. I tried to follow the examples in this repo but faced some problems. Trying to modify the enwik8_simple I didn't understand how to:

  1. Load my custom data into examples (I have a poetry dataset).
  2. Generate output from a start prefix and until an end token.

Thanks a lot!

@lucidrains
Copy link
Owner

@timsoraro that's the most popular request! #33 I'll think about how I would do it

@lucidrains
Copy link
Owner

@timsoraro have you seen https://www.gwern.net/GPT-2

@timsoraro
Copy link
Author

Indeed! I wanted to see how the performance of the Reformer compare with GPT-2. Later I would love to train it on PG-19. I want to comper Reformer's perplexity with that of Compressive Transformer's and to test its generation capabilities on such long text.

@lucidrains
Copy link
Owner

@timsoraro omg! we are on the same wavelength! yes, I am very excited about compressive transformer. I may integrate that concept with Reformer at some point, but I need to figure out how to do relative positional encoding in the context of LSH attention.

I think GPT-2 has a context length long enough for most poetry. Also, Reformer sadly doesn't have any pretrained models out yet, so there's nothing to fine-tune. Whatever you train from scratch is likely not as good as just fine-tuning a GPT-2 model (for now)

@timsoraro
Copy link
Author

@timsoraro omg! we are on the same wavelength! yes, I am very excited about compressive transformer. I may integrate that concept with Reformer at some point, but I need to figure out how to do relative positional encoding in the context of LSH attention.

That would be incredible. I always wondered when we'd be able to train on such long sequences. It seems like we're just getting started.

I think GPT-2 has a context length long enough for most poetry. Also, Reformer sadly doesn't have any pretrained models out yet, so there's nothing to fine-tune. Whatever you train from scratch is likely not as good as just fine-tuning a GPT-2 model (for now)

True... Still worth giving a shot, imo. Also, we can make much bigger/deeper models with Reformer on the same hardware and sequence length, no? That's also exciting.

@lucidrains
Copy link
Owner

Yes, I was surprised that LSH could work, and there could be better algorithms out there yet to be explored https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee

Being totally transparent, but there may be an issue with multi-GPUs with the reversible net. It seems like there is no reversible neural net implementation without noted issues with memory. RobinBruegger/RevTorch#10 and silvandeleemput/memcnn#37 Both are ongoing, and could use all the eyes it can get.

@lucidrains
Copy link
Owner

But on a single GPU, if you are willing to wait longer, it seems to approximate full attention at impossible lengths!

@timsoraro
Copy link
Author

Yes, I was surprised that LSH could work, and there could be better algorithms out there yet to be explored https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee

Very interesting! Thanks for sharing.

Being totally transparent, but there may be an issue with multi-GPUs with the reversible net. It seems like there is no reversible neural net implementation without noted issues with memory. RobinBruegger/RevTorch#10 and silvandeleemput/memcnn#37 Both are ongoing, and could use all the eyes it can get.

I'm not much proficient with DL myself (I have only a programming background), but solving the multi-GPUs problem is definitely necessary in order to train on big datasets like PG-19. But didn't #19 managed to solve this (apart from pytorch-lightning)?

But on a single GPU, if you are willing to wait longer, it seems to approximate full attention at impossible lengths!

:)

@lucidrains
Copy link
Owner

#19 solved an issue with using Apex for half precision as well as for allowing for more complicated architectures, where a reversible net undergoes multiple backward passes (encoder -> decoder), but it didn't solve the multi-GPU memory issue yet

@timsoraro
Copy link
Author

Oh, okay (: 👍

@lucidrains
Copy link
Owner

@timsoraro hey I'm back, and I think I have a plan for how to abstract this cleanly! I was wondering, for your poetry dataset, how is each poem separated? Is it per separate file?

@timsoraro
Copy link
Author

@lucidrains Hi! At first, I thought about formatting as gwern did here (albeit adding a start prefix), but I decided to train a conversational dataset formatted this way instead: https://pastebin.com/AYshKbgM.

What's your plan? (:

@lucidrains
Copy link
Owner

lucidrains commented Mar 3, 2020

I was thinking of an interface like this -

from reformer_pytorch.generative_tools import Trainer
r = ReformerLM(**config)

trainer = Trainer(
    data_path='./folder',
    start_token='<|startoftext|>',
    end_token='<|endoftext|>',
    tokenizer_model_path = './model.model'  # will tokenize all the text and create the file if it does not exist
)

trainer.set_model(r)
trainer.train() # wait for some time

sample = trainer.generate('some initial text', 1024)

that should be one step removed away from a command line tool!

@lucidrains
Copy link
Owner

lucidrains commented Mar 3, 2020

@timsoraro I'm also deliberating whether this should be in a separate repository, something like reformer-pytorch-textgen

@timsoraro
Copy link
Author

Looks good to me! (: Will be the first to test.

I'm also deliberating whether this should be in a separate repository, something like reformer-pytorch-textgen

I'll let you decide on that one.

@lucidrains
Copy link
Owner

@timsoraro I got Microsoft's DeepSpeed working with Reformer! I've tested it with multiple GPUs on a local setup. One step closer to closing this issue!

@zbloss
Copy link
Contributor

zbloss commented Mar 6, 2020

Fantastic!!!

@lucidrains lucidrains added the enhancement New feature or request label Mar 6, 2020
@timsoraro
Copy link
Author

I think you can close this one. BTW, any update if after the caching/other. improvements the performance of Reformer is on par with regular Transformer?

@lucidrains
Copy link
Owner

@timsoraro hello again! :) Reformer hasn't really worked out for me as well as I thought, so I moved on to another architecture that @AranKomat cued me in on. You should try it out at https://github.com/lucidrains/sinkhorn-transformer ! Currently, it needs to be always padded to the maximum sequence length, but I intend to fix that this week!

@timsoraro
Copy link
Author

That's interesting! I'll check it out. Do you have an idea of how it performs on text?

@lucidrains
Copy link
Owner

sinkhorn + local

If you mix it with local attention heads, it performs really well! I will try training it on the PG19 dataset soon

@timsoraro
Copy link
Author

@lucidrains Very cool! I'm gonna give it a try and report back. Keep up the good work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants