-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wandb breaks tests - importlib.util.find_spec-related under forked process #9623
Comments
@sgugger, I think the culprit for the 2nd error, when I uninstalled wandb is:
as it returns
You can see it with any ddp test, so you don't need to install deepspeed or fairscale to see it, e.g. this fails too:
But a single unforked process test works just fine:
and then there is another problem which occurs with |
But with
|
I'm not sure I understand your first error. Could you give us more details? Are you saying that For the last error, pinging @borisdayma |
I had a similar issue recently with python 3.8 but it worked with 3.7. It was due to a function from "importlib" which changed name. Is it the same? |
@borisdayma, I have just installed python-3.7.9 and have the same issue there. Perhaps you had it working with python < 3.7.9? @sgugger yes, the problem occurs only when there is DDP. If I drop To reproduce:
which results in:
If you then remove wand:
The 2nd error happens:
The full traces are in the OP. Please let me know if you need any other info. |
I am running into the same issue with DDP @stas00 has #9623 (comment) transformers/src/transformers/integrations.py Line 586 in 897a24c
|
Interesting, can you check it solves the issue on your side @tristandeleu ? |
It does work for me when I replace it with if state.is_world_process_zero:
self._wandb.log({}) There is also another thing I ran into at the same time: EDIT: This solves the issue with DDP though, I don't know if it also solves the original issue #9623 (comment) |
Don't hesitate to suggest a PR with your fix @tristandeleu |
I had the same problem. and I just use > if state.is_world_process_zero: self._wandb.log({}), forget self._log_model = False. Thanks !!! |
Even with revising these codes, the program(with TPU) doesn't seem to stop at the end |
This PR solves part of #9623 It tries to actually do what #9699 requested/discussed and that is any value of `WANDB_DISABLED` should disable wandb. The current behavior is that it has to be one of `ENV_VARS_TRUE_VALUES = {"1", "ON", "YES"}` I have been using `WANDB_DISABLED=true` everywhere in scripts as it was originally advertised. I have no idea why this was changed to a sub-set of possible values. And it's not documented anywhere. @sgugger
* [t5 doc] typos a few run away backticks @sgugger * style * [trainer] put fp16 args together this PR proposes a purely cosmetic change that puts all the fp16 args together - so they are easier to manager/read @sgugger * style * [wandb] make WANDB_DISABLED disable wandb with any value This PR solves part of #9623 It tries to actually do what #9699 requested/discussed and that is any value of `WANDB_DISABLED` should disable wandb. The current behavior is that it has to be one of `ENV_VARS_TRUE_VALUES = {"1", "ON", "YES"}` I have been using `WANDB_DISABLED=true` everywhere in scripts as it was originally advertised. I have no idea why this was changed to a sub-set of possible values. And it's not documented anywhere. @sgugger * WANDB_DISABLED=true to disable; make tf trainer consistent * style
@lkk12014402 can you confirm it still happens with latest HF master branch? |
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions. If you think this still needs to be addressed please comment on this thread. |
This has to do with a forked process environment:
I was running:
and was getting:
I tried to remove
wandb
and whilepip uninstall wandb
worked, wandb left code behind and I had to remove it manually:But the problem continued without having any wandb installed:
The strange
stderr
prefix is from our multiprocess testing setup which requires special handling as pytest can't handle DDP and a like on its own.The only way I was able to overcome this is with:
I'm on
transformers
master.The text was updated successfully, but these errors were encountered: