Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Models from Transformers Lib #5187

Merged
merged 24 commits into from
Feb 12, 2020
Merged

Language Models from Transformers Lib #5187

merged 24 commits into from
Feb 12, 2020

Conversation

dakshvar22
Copy link
Contributor

@dakshvar22 dakshvar22 commented Feb 4, 2020

Proposed changes:

  • Create a new NLP component - HFTransformersNLP which tokenizes and featurizes incoming messages using the Transformers Library.
  • Create LanguageModelTokenizers and LanguageModelFeaturizers which use the information from HFTransformersNLP and sets them correctly for message object
  • Architectures supported: Bert, OpenAIGPT, GPT-2, XLNet, DistilBert, Roberta

Part of https://github.com/RasaHQ/research/issues/62

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

Copy link
Contributor

@Ghostvv Ghostvv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Have a couple of comments. Also, does it makes sense to create 3 files for hf? Why don't we put all these helpers into 1 file?

Copy link
Contributor

@tabergma tabergma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add some tests for the components.

@tabergma
Copy link
Contributor

tabergma commented Feb 4, 2020

Can you also create a changelog entry and add some documentation? E.g. add the new components to https://rasa.com/docs/rasa/nlu/components/.

@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Feb 4, 2020

@tabergma Yes, tests and documentation are to be added. That wasn't ready. :)
@Ghostvv I feel readability is better. From a maintenance perspective, we know that registry.py and transformers_pre_post_processors.py is where bulk of it would happen since new models can basically be added by editing those two files. IMO the three files help in that sense. What do you think?

Copy link
Contributor

@Ghostvv Ghostvv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to add when to use what in a separate PR?

Copy link
Contributor

@tabergma tabergma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the comments I already made, it looks good 🚀 Great work!

@dakshvar22 dakshvar22 merged commit 3c30155 into tf2 Feb 12, 2020
@dakshvar22 dakshvar22 deleted the transformers_lm branch February 12, 2020 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants