Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Update transformers requirement from <3.6,>=3.4 to >=3.4,<4.1 #4831

Merged
merged 6 commits into from
Dec 11, 2020

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Dec 1, 2020

Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

Transformers v4.0.0: Fast tokenizers, model outputs, file reorganization

Transformers v4.0.0-rc-1: Fast tokenizers, model outputs, file reorganization

Breaking changes since v3.x

Version v4.0.0 introduces several breaking changes that were necessary.

1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.

The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set. The main breaking change is the handling of overflowing tokens between the python and rust tokenizers.

How to obtain the same behavior as v3.x in v4.x

In version v3.x:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xxx")

to obtain the same in version v4.x:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xxx", use_fast=False)

2. SentencePiece is removed from the required dependencies

The requirement on the SentencePiece dependency has been lifted from the setup.py. This is done so that we may have a channel on anaconda cloud without relying on conda-forge. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard transformers installation.

This includes the slow versions of:

  • XLNetTokenizer
  • AlbertTokenizer
  • CamembertTokenizer
  • MBartTokenizer
  • PegasusTokenizer
  • T5Tokenizer
  • ReformerTokenizer
  • XLMRobertaTokenizer

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version v3.x, you should install sentencepiece additionally:

In version v3.x:

pip install transformers
</tr></table> 

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Dec 1, 2020
@AkshitaB AkshitaB self-assigned this Dec 4, 2020
@@ -64,7 +64,8 @@
"scikit-learn",
"scipy",
"pytest",
"transformers>=3.4,<3.6",
"transformers>=3.4,<4.1",
"sentencepiece",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the release notes, sentencepiece is not required as a dependency in transformers by default. But we use certain tokenizers from HF that require it.

@AkshitaB AkshitaB requested a review from epwalsh December 11, 2020 00:09
@@ -99,7 +99,7 @@ def test_transformers_vocab_sizes(self, model_name):

def test_transformers_vocabs_added_correctly(self):
namespace, model_name = "tags", "roberta-base"
tokenizer = cached_transformers.get_tokenizer(model_name)
tokenizer = cached_transformers.get_tokenizer(model_name, use_fast=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use_fast=False?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RobertaTokenizerFast does not have the attribute encoder which we use in this test case.

Co-authored-by: Evan Pete Walsh <[email protected]>
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dependabot @github
Copy link
Contributor Author

dependabot bot commented on behalf of github Dec 11, 2020

A newer version of transformers exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as normal once this PR is merged.

@AkshitaB AkshitaB merged commit 84a36a0 into master Dec 11, 2020
@AkshitaB AkshitaB deleted the dependabot/pip/transformers-gte-3.4-and-lt-4.1 branch December 11, 2020 23:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants