Skip to content

minimaxir/reddit-gpt-2-cloud-run

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reddit-gpt-2-cloud-run

Code for running a Reddit title generator API using gpt-2-cloud-run. You can play with the API here.

The Reddit data was retrieved using the BigQuery in query.sql, which retrieves the Top 2000 posts on each of the Top 2500 subreddits from January 2017 to February 2019 (w/ miscellaneous quality filters).

The resulting CSV was encoded using gpt-2-keyword-generation (w/ a 32 vCPU cloud machine as it's a lot of data!), pre-encoded for training using gpt-2-simple's encode_dataset() function (since otherwise it would take a half hour to start training!) and GPT-2 117M was finetuned on the resulting pre-encoded dataset using gpt-2-simple.

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Disclaimer

This repo has no affiliation or relationship with OpenAI.

About

Reddit title generator API based on GPT-2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published