Skip to content

Commit

Permalink
Update distributed.md
Browse files Browse the repository at this point in the history
  • Loading branch information
zhouzaida authored Feb 3, 2023
1 parent 1b85228 commit cdd1a36
Showing 1 changed file with 9 additions and 8 deletions.
17 changes: 9 additions & 8 deletions docs/en/advanced_tutorials/distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,16 @@ We will detail on these APIs in the following chapters.

- [init_dist](mmengine.dist.init_dist): Launch function of distributed training. Currently it supports 3 launchers including pytorch, slurm and MPI. It also setup the given communication backends, defaults to NCCL.

If you need to change the runtime timeout (default=30 minutes) for distributed operations that take very long, you can specify a different timeout in your runtime configuration like this:
If you need to change the runtime timeout (default=30 minutes) for distributed operations that take very long, you can specify a different timeout in your `env_cfg` configuration passing in [Runner](mmengine.runner.Runner) like this:

```python
env_cfg = dict(
cudnn_benchmark=True,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl', timeout=10800), # Sets the timeout to 3h (10800 seconds)
)
```
```python
env_cfg = dict(
cudnn_benchmark=True,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl', timeout=10800), # Sets the timeout to 3h (10800 seconds)
)
runner = Runner(xxx, env_cfg=env_cfg)
```

## Query and control

Expand Down

0 comments on commit cdd1a36

Please sign in to comment.