[Disco] Add loader for presharded params. #15957

Lunderberg · 2023-10-20T16:10:39Z

Prior to this commit, sharding of model weights was always performed when initializing the model. This could cause slow initialization, especially for larger numbers of GPUs, as all model weights are initially transferred to GPU-0, before being scattered to all workers.

This commit updates the tvm::runtime::ShardLoaderObj to also allow loading of pre-sharded model weights. With pre-sharded model weights, the tensors are sharded while the model is being built, and each worker independently loads the specific model weights that it requires.

Lunderberg · 2023-10-20T16:11:11Z

This PR was developed in collaboration with @csullivan, and is based on #15676.

Lunderberg · 2023-11-03T17:34:56Z

Rebased onto main to re-run CI, as 2-week-old CI results are a bit stale for my preferences.

@junrushao Could I get a review on this PR?

Prior to this commit, sharding of model weights was always performed when initializing the model. This could cause slow initialization, especially for larger numbers of GPUs, as all model weights are initially transferred to GPU-0, before being scattered to all workers. This commit updates the `tvm::runtime::ShardLoaderObj` to also allow loading of pre-sharded model weights. With pre-sharded model weights, the tensors are sharded while the model is being built, and each worker independently loads the specific model weights that it requires.

masahi

I confirmed that it works. @junrushao Any concerns in merging this?

src/runtime/disco/loader.cc

src/runtime/disco/nccl/nccl.cc

src/runtime/disco/worker.cc

This was referenced Oct 20, 2023

[MultiGPU] Support pre-sharded model weights mlc-ai/mlc-llm#1096

Merged

[Disco] Add loader for presharded params. #15676

Closed

Lunderberg force-pushed the disco_load_presharded_params branch from e68fb67 to 9f019ca Compare November 3, 2023 17:34

Lunderberg force-pushed the disco_load_presharded_params branch from 9f019ca to 8a62451 Compare November 6, 2023 20:00

masahi approved these changes Nov 7, 2023

View reviewed changes

junrushao reviewed Nov 8, 2023

View reviewed changes

src/runtime/disco/loader.cc Outdated Show resolved Hide resolved

junrushao reviewed Nov 8, 2023

View reviewed changes

src/runtime/disco/nccl/nccl.cc Outdated Show resolved Hide resolved

Update based on review comments.

7fdb447

junrushao reviewed Nov 8, 2023

View reviewed changes

src/runtime/disco/worker.cc Outdated Show resolved Hide resolved

Removed commented-out print statements

af8ae50

Lunderberg merged commit e359e7a into apache:unity Nov 9, 2023

Lunderberg deleted the disco_load_presharded_params branch November 9, 2023 14:05

Lunderberg mentioned this pull request Nov 9, 2023

[Contrib] Save/load ShapeTuple in tvm.contrib.tvmjs #15700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Disco] Add loader for presharded params. #15957

[Disco] Add loader for presharded params. #15957

Lunderberg commented Oct 20, 2023

Lunderberg commented Oct 20, 2023 •

edited

Loading

Lunderberg commented Nov 3, 2023

masahi left a comment •

edited

Loading

[Disco] Add loader for presharded params. #15957

[Disco] Add loader for presharded params. #15957

Conversation

Lunderberg commented Oct 20, 2023

Lunderberg commented Oct 20, 2023 • edited Loading

Lunderberg commented Nov 3, 2023

masahi left a comment • edited Loading

Choose a reason for hiding this comment

Lunderberg commented Oct 20, 2023 •

edited

Loading

masahi left a comment •

edited

Loading