module 'torch.distributed' has no attribute 'ProcessGroup' #1056
Replies: 2 comments 1 reply
-
HI @aprogrotess thanks for posting this in the MONAI Label repo. Looking at the logs, there is a torch.distributed package issue within ignite. There might be version problems. Could you provide environments (e.g., Systems and packages monai, monailabel, torch, ignite,etc..) ? Let's see if anyone has seen this env issue before. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Thanks for advise, from error stack, i first think it about deepedit algorithm problem. Now it seems to problem with pytorch.distribued modules. Then, after install pytorch lasted from conda(cpu), the distribued.is_available() is True, but i need to use cuda, so i try (https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048/1184) Pytorch for jetson(1.12.0), distribued.is_available() is False. The problem may relate to build options form my point, i have reflect the problem to comminty, :) |
Beta Was this translation helpful? Give feedback.
-
Hello, i am trying to development my label app in jetson orin AGX, but after conf envs, i get error like: module 'torch.distributed' has no attribute 'ProcessGroup'.
The error stack is list, can anyone help me
[2022-10-09 10:22:16,941] [2553511] [MainThread] [ERROR] (uvicorn.error:119) - Traceback (most recent call last):
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/starlette/routing.py", line 635, in lifespan
async with self.lifespan_context(app):
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/starlette/routing.py", line 530, in aenter
await self._router.startup()
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/starlette/routing.py", line 612, in startup
await handler()
File "/home/tacom/mount_980/code/3D/MONAILabel/monailabel/app.py", line 104, in startup_event
instance = app_instance()
File "/home/tacom/mount_980/code/3D/MONAILabel/monailabel/interfaces/utils/app.py", line 51, in app_instance
app = c(app_dir=app_dir, studies=studies, conf=conf)
File "/home/tacom/mount_980/code/3D/MONAILabel/sample-apps/radiology/main.py", line 44, in init
for c in get_class_names(lib.configs, "TaskConfig"):
File "/home/tacom/mount_980/code/3D/MONAILabel/monailabel/utils/others/class_utils.py", line 144, in get_class_names
module = importlib.import_module("." + name, package=current_module_name)
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 843, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/tacom/mount_980/code/3D/MONAILabel/sample-apps/radiology/lib/configs/deepedit.py", line 17, in
import lib.trainers
File "/home/tacom/mount_980/code/3D/MONAILabel/sample-apps/radiology/lib/trainers/init.py", line 12, in
from .deepedit import DeepEdit
File "/home/tacom/mount_980/code/3D/MONAILabel/sample-apps/radiology/lib/trainers/deepedit.py", line 45, in
from monailabel.tasks.train.basic_train import BasicTrainTask, Context
File "/home/tacom/mount_980/code/3D/MONAILabel/monailabel/tasks/train/basic_train.py", line 23, in
import ignite
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/ignite/init.py", line 3, in
import ignite.engine
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/ignite/engine/init.py", line 10, in
from ignite.metrics import Metric
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/ignite/metrics/init.py", line 7, in
from ignite.metrics.frequency import Frequency
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/ignite/metrics/frequency.py", line 7, in
from ignite.handlers.timing import Timer
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/ignite/handlers/init.py", line 5, in
from ignite.handlers.checkpoint import Checkpoint, DiskSaver, ModelCheckpoint
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/ignite/handlers/checkpoint.py", line 17, in
from torch.distributed.optim import ZeroRedundancyOptimizer
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/torch/distributed/optim/init.py", line 27, in
from .post_localSGD_optimizer import PostLocalSGDOptimizer
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/torch/distributed/optim/post_localSGD_optimizer.py", line 2, in
import torch.distributed.algorithms.model_averaging.averagers as averagers
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/torch/distributed/algorithms/model_averaging/averagers.py", line 5, in
import torch.distributed.algorithms.model_averaging.utils as utils
File "/home/tacom/envs/archiconda3/envs/development/lib/python3.8/site-packages/torch/distributed/algorithms/model_averaging/utils.py", line 10, in
params: Iterator[torch.nn.Parameter], process_group: dist.ProcessGroup
Beta Was this translation helpful? Give feedback.
All reactions