-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement generate_vbe_metadata cpu #3715
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D69162870 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
7529dfe
to
b2d0bcd
Compare
Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Differential Revision: D69162870
This pull request was exported from Phabricator. Differential Revision: D69162870 |
Differential Revision: D68055168 [fbgemm_gpu] Update torchrec to use learning_rate_tensor D69799449
This pull request was exported from Phabricator. Differential Revision: D69162870 |
b2d0bcd
to
aac9690
Compare
Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Differential Revision: D69162870
Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Differential Revision: D69162870
This pull request was exported from Phabricator. Differential Revision: D69162870 |
aac9690
to
ae43025
Compare
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/796
This diff implements
generate_vbe_metadata
for cpu, such that the function returns the same output for CPU, CUDA and MTIA.To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++.
Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations.
VBE CPU tests are in the next diff.
Reviewed By: sryap
Differential Revision: D69162870