Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Ensure loading mp first #252

Merged
merged 4 commits into from
Jan 29, 2025
Merged

[core] Ensure loading mp first #252

merged 4 commits into from
Jan 29, 2025

Conversation

sayakpaul
Copy link
Collaborator

Referencing: https://huggingface.co/docs/accelerate/main/en/concept_guides/deferring_execution

This PR ensures:

We download files first on the main process and then load the cached files afterward i.e., other processes.

Have run the tests, too. Absolutely okay if you don't want it.

@sayakpaul sayakpaul requested a review from a-r-r-o-w January 29, 2025 09:45
Copy link
Owner

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sayak, makes sense. Just FYI, I'm in the process of removing accelerate in #245 so that we can use our own distributed implementation directly in pure pytorch. With accelerate, it is becoming very hard for me dive into the code and make changes that allow PP/TP/CP to work, leading to several NCCL hangs. There is a lot of code in accelerate uses to glue together Deepspeed, FSDP, DDP, etc. so you can imagine how much of a problem debugging is (not saying that's a bad thing, but that we're limited until they provide some APIs to PP/TP/CP). Rather, having to use own backend, built with all parallelism in mind, and following fastest inference backends like xDiT and ParaAttention, is easier to understand and more easily tailorable to our use case. This is absolutely necessary if we want to be doing multi-node training for distillation, etc. in the near future imo.

We can do a stable release before the major codebase change coming. If you've tested this well for common cases, good to merge for the stable release

@a-r-r-o-w
Copy link
Owner

On second thought, maybe I could write a dispatch layer that allows using accelerate backend as well as the one I'm working on. It shouldn't be too hard hopefully, and this way we will maintain backwards compatibility with whatever existing support we have

@sayakpaul
Copy link
Collaborator Author

Thanks Sayak, makes sense. Just FYI, I'm in the process of removing accelerate in #245 so that we can use our own distributed implementation directly in pure pytorch. With accelerate, it is becoming very hard for me dive into the code and make changes that allow PP/TP/CP to work, leading to several NCCL hangs. There is a lot of code in accelerate uses to glue together Deepspeed, FSDP, DDP, etc. so you can imagine how much of a problem debugging is (not saying that's a bad thing, but that we're limited until they provide some APIs to PP/TP/CP). Rather, having to use own backend, built with all parallelism in mind, and following fastest inference backends like xDiT and ParaAttention, is easier to understand and more easily tailorable to our use case. This is absolutely necessary if we want to be doing multi-node training for distillation, etc. in the near future imo.

I echo that and will support you!

We can do a stable release before the major codebase change coming. If you've tested this well for common cases, good to merge for the stable release

Yeah can confirm this works for the supported use cases currently.

On second thought, maybe I could write a dispatch layer that allows using accelerate backend as well as the one I'm working on. It shouldn't be too hard hopefully, and this way we will maintain backwards compatibility with whatever existing support we have

Well, we know that we need a framework for doing parallelism and other shenanigans easily So, #245 is a no brainer here. Maybe easier to have the dispatcher in a separate PR so that your PR majorly focuses on non accelerate support? Just offering a perspective.

@a-r-r-o-w a-r-r-o-w merged commit 836ac78 into main Jan 29, 2025
1 check passed
@a-r-r-o-w a-r-r-o-w deleted the ensure-loading-mp-first branch January 29, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants