Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][distributed] support layer size undividable by pp size in pipeline parallel inference #6115

Merged
merged 6 commits into from
Jul 3, 2024

Conversation

youkaichao
Copy link
Member

fixes #6114

Copy link
Collaborator

@andoorve andoorve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a subtle point here:

num_blocks = self._run_workers("determine_num_available_blocks", )
# Since we use a shared centralized controller, we take the minimum
# number of blocks across all workers to make sure all the memory
# operators can be applied to all workers.
num_gpu_blocks = min(b[0] for b in num_blocks)
num_cpu_blocks = min(b[1] for b in num_blocks)
return num_gpu_blocks, num_cpu_blocks

We take min of blocks across all workers - as a result of this the GPU mem utilization of 0...n-2th will be slightly lower than expected.

I don't think it's a big deal, just something to be aware of.

@youkaichao
Copy link
Member Author

We take min of blocks across all workers - as a result of this the GPU mem utilization of 0...n-2th will be slightly lower than expected.

Good point. I also noticed this. Some GPU memory waste, but it is better than not being supported.

Copy link
Collaborator

@andoorve andoorve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@youkaichao youkaichao merged commit 3de6e6a into vllm-project:main Jul 3, 2024
58 of 65 checks passed
@youkaichao youkaichao deleted the pp_odd_size branch July 3, 2024 23:41
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: support layer size undividable by pp size in pipeline parallel inference
2 participants