[Bugfix] Using None for mpu when PP > 1 #34

zarzen · 2023-02-01T02:01:19Z

Description

Check the pipeline parallel size before creating the mpu grid to avoid initialization error
Avoid creating duplicate communication groups when pipeline parallelism is used.

Checklist

PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

comaniac

LGTM. cc @szhengac

szhengac · 2023-02-01T02:47:58Z

slapo/model_dialect/deepspeed/engine.py

@@ -22,7 +22,11 @@ def init_ds_engine(model, **kwargs):
        raise ValueError("DeepSpeed config not provided.")
    mpu = kwargs.get("topology", None)
    if mpu is not None and isinstance(mpu, PipeModelDataParallelTopology):
-        mpu = PipelineParallelGrid(topology=mpu)
+        if mpu.get_dim("pipe") <= 1:


it could be 0?

Not really, just for separating the conditions for pipeline and no pipeline in a binary form.

Zhen Zhang added 2 commits February 1, 2023 01:53

Using None for mpu when PP > 1

27094ca

lint

806c186

comaniac approved these changes Feb 1, 2023

View reviewed changes

szhengac reviewed Feb 1, 2023

View reviewed changes

szhengac merged commit dc795d6 into awslabs:main Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Using None for mpu when PP > 1 #34

[Bugfix] Using None for mpu when PP > 1 #34

zarzen commented Feb 1, 2023

comaniac left a comment

szhengac Feb 1, 2023

zarzen Feb 1, 2023

[Bugfix] Using None for mpu when PP > 1 #34

[Bugfix] Using None for mpu when PP > 1 #34

Conversation

zarzen commented Feb 1, 2023

Description

Checklist

comaniac left a comment

Choose a reason for hiding this comment

szhengac Feb 1, 2023

Choose a reason for hiding this comment

zarzen Feb 1, 2023

Choose a reason for hiding this comment