-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to easily train text generation models a la gpt-2-simple repo #39
Comments
@timsoraro that's the most popular request! #33 I'll think about how I would do it |
@timsoraro have you seen https://www.gwern.net/GPT-2 |
Indeed! I wanted to see how the performance of the Reformer compare with GPT-2. Later I would love to train it on PG-19. I want to comper Reformer's perplexity with that of Compressive Transformer's and to test its generation capabilities on such long text. |
@timsoraro omg! we are on the same wavelength! yes, I am very excited about compressive transformer. I may integrate that concept with Reformer at some point, but I need to figure out how to do relative positional encoding in the context of LSH attention. I think GPT-2 has a context length long enough for most poetry. Also, Reformer sadly doesn't have any pretrained models out yet, so there's nothing to fine-tune. Whatever you train from scratch is likely not as good as just fine-tuning a GPT-2 model (for now) |
That would be incredible. I always wondered when we'd be able to train on such long sequences. It seems like we're just getting started.
True... Still worth giving a shot, imo. Also, we can make much bigger/deeper models with Reformer on the same hardware and sequence length, no? That's also exciting. |
Yes, I was surprised that LSH could work, and there could be better algorithms out there yet to be explored https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee Being totally transparent, but there may be an issue with multi-GPUs with the reversible net. It seems like there is no reversible neural net implementation without noted issues with memory. RobinBruegger/RevTorch#10 and silvandeleemput/memcnn#37 Both are ongoing, and could use all the eyes it can get. |
But on a single GPU, if you are willing to wait longer, it seems to approximate full attention at impossible lengths! |
Very interesting! Thanks for sharing.
I'm not much proficient with DL myself (I have only a programming background), but solving the multi-GPUs problem is definitely necessary in order to train on big datasets like PG-19. But didn't #19 managed to solve this (apart from pytorch-lightning)?
:) |
#19 solved an issue with using Apex for half precision as well as for allowing for more complicated architectures, where a reversible net undergoes multiple backward passes (encoder -> decoder), but it didn't solve the multi-GPU memory issue yet |
Oh, okay (: 👍 |
@timsoraro hey I'm back, and I think I have a plan for how to abstract this cleanly! I was wondering, for your poetry dataset, how is each poem separated? Is it per separate file? |
@lucidrains Hi! At first, I thought about formatting as gwern did here (albeit adding a start prefix), but I decided to train a conversational dataset formatted this way instead: https://pastebin.com/AYshKbgM. What's your plan? (: |
I was thinking of an interface like this - from reformer_pytorch.generative_tools import Trainer
r = ReformerLM(**config)
trainer = Trainer(
data_path='./folder',
start_token='<|startoftext|>',
end_token='<|endoftext|>',
tokenizer_model_path = './model.model' # will tokenize all the text and create the file if it does not exist
)
trainer.set_model(r)
trainer.train() # wait for some time
sample = trainer.generate('some initial text', 1024) that should be one step removed away from a command line tool! |
@timsoraro I'm also deliberating whether this should be in a separate repository, something like |
Looks good to me! (: Will be the first to test.
I'll let you decide on that one. |
@timsoraro I got Microsoft's DeepSpeed working with Reformer! I've tested it with multiple GPUs on a local setup. One step closer to closing this issue! |
Fantastic!!! |
I think you can close this one. BTW, any update if after the caching/other. improvements the performance of Reformer is on par with regular Transformer? |
@timsoraro hello again! :) Reformer hasn't really worked out for me as well as I thought, so I moved on to another architecture that @AranKomat cued me in on. You should try it out at https://github.com/lucidrains/sinkhorn-transformer ! Currently, it needs to be always padded to the maximum sequence length, but I intend to fix that this week! |
That's interesting! I'll check it out. Do you have an idea of how it performs on text? |
@lucidrains Very cool! I'm gonna give it a try and report back. Keep up the good work. |
Greetings! Your repository is a very welcomed contribution. I tried to follow the examples in this repo but faced some problems. Trying to modify the enwik8_simple I didn't understand how to:
Thanks a lot!
The text was updated successfully, but these errors were encountered: