Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add multiproc_worker_utils for multiprocessing-based workers #4357

Merged
merged 8 commits into from
May 1, 2024

Conversation

njhill
Copy link
Member

@njhill njhill commented Apr 25, 2024

Mechanics for running multiprocessing-based worker processes and dispatching tasks to them via multiprocessing queues.

The class ProcessWorkerWrapper is equivalent to RayWorkerWrapper.

Each ProcessWorkerWrapper has a task queue for passing tasks to the corresponding worker process.

There is a shared result queue for receiving the results from all worker processes. ResultHandler is a thread that processes this result queue, completing the corresponding waiting futures as results come in.

WorkerMonitor is a thread that uses the multiprocessing wait() method to monitor the state of all worker processes and tear things down cleanly if any exit unexpectedly - including completing all waiting futures with an error and terminating the remaining processes.

_run_worker_process is the function that runs in each of the worker processes, in a loop waiting on their corresponding task queue.

This is in preparation for a MultiprocessingGPUExecutor.

Mechanics for running multiprocessing-based worker processes and dispatching tasks to them via multiprocessing queues.

The class ProcessWorkerWrapper is equivalent to RayWorkerWrapper.

Each ProcessWorkerWrapper has a task queue for passing tasks to the corresponding worker process.

There is a shared result queue for receiving the results from all worker processes. ResultHandler is a thread that processes this result queue, completing the corresponding waiting futures as results come in.

WorkerMonitor is a thread that uses the multiprocessing wait() method to monitor the state of all worker processes and tear things down cleanly if any exit unexpectedly - including completing all waiting futures with an error and terminating the remaining processes.

_run_worker_process is the function that runs in each of the worker processes, in a loop waiting on their corresponding task queue.

This is in preparation for a MultiprocessingGPUExecutor.
@njhill
Copy link
Member Author

njhill commented Apr 25, 2024

@zhuohan123 this is the file we already talked through, with some minor updates:

  • Renamed VllmLocalWorker to ProcessWorkerWrapper (in-line with the recent renaming of VllmRayWorker to RayWorkerWrapper)
  • Changed this class to not inherit from multiprocessing.Process, instead just have the Process as a field (I agree this makes it clearer)
  • Fix some mypy typing issues

@njhill njhill requested a review from zhuohan123 April 25, 2024 07:06
@vrdn-23 vrdn-23 mentioned this pull request Apr 30, 2024
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a small comment.

@njhill njhill enabled auto-merge (squash) May 1, 2024 18:40
@njhill njhill merged commit a657bfc into vllm-project:main May 1, 2024
48 checks passed
@njhill njhill deleted the multiproc_utils branch May 5, 2024 15:08
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024
@cermeng
Copy link
Contributor

cermeng commented Jul 29, 2024

@njhill I know the PR is already merged, but I left some comments which could be related to issue #6219. I would appreciate it if you could address these comments, and I can work further on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants