-
Notifications
You must be signed in to change notification settings - Fork 2.3k
TypeError: can't pickle Tokenizer objects when num_workers > 0 and lazy = true #4399
Comments
Hi @JohnGiorgi, can you share your config? Are you using the |
Hi @epwalsh, yes, it looks like
so maybe my issue is unnecessary and I should leave In any case, I have updated my original issue with a minimal example that triggers the error. |
Gotcha. Yea, like the warning says there is probably no benefit to using But even then, you'll probably still see his exception, which arises because each Now when the main process loading data needs to gather the |
That said, we are planning on making some changes to our data loading story soon. One of the proposed changes is to make |
@epwalsh Gotcha, thanks for the detailed response. For now, I will leave I will lookout for the proposed changes to the |
Checklist
master
branch of AllenNLP.pip freeze
.Description
I get a
TypeError: can't pickle Tokenizer objects
when trying to train a model that uses aPretrainedTransformerTokenizer
tokenizer when"dataset_reader.lazy": true
and"data_loader.num_workers" > 0
. This appears to happen for every version of AllenNLP after 1.0.0rc3 (specifically this commit) including the current master branch. The 1.0.0rc3 release and earlier releases do not have this issue.The notes in #4344 seem to suggest it has been solved, but I can still trigger it with a minimal example (see below).
Python traceback:
Related issues or possible duplicates
Environment
OS:
Python version: 3.7.4
Output of
pip freeze
:Steps to reproduce
PretrainedTransformerTokenizer
with"dataset_reader.lazy": true
and"data_loader.num_workers" > 0
. E.g. I used this config with some overrides (see below).Example source:
allennlp train mnli_roberta.jsonnet \ --serialization-dir ./debug \ --overrides "{'dataset_reader.lazy': true, 'data_loader.batch_sampler': null, 'data_loader.num_workers': 1}" \ -f
The text was updated successfully, but these errors were encountered: