-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate multi-qubit gates #490
Conversation
…ata` to work with devices. M pennylane_lightning/core/src/simulators/lightning_kokkos/StateVectorKokkos.hpp; `applyMatrix` bugfix: use intermediate hostview to copy matrix data; same bugfix for `getDataVector`. M pennylane_lightning/core/src/simulators/lightning_kokkos/algorithms/AdjointJacobianKokkos.hpp; use copy constructor. M pennylane_lightning/core/src/simulators/lightning_kokkos/measurements/MeasurementsKokkos.hpp; use copy constructor. M pennylane_lightning/core/src/simulators/lightning_kokkos/observables/ObservablesKokkos.hpp; use copy constructor. M requirements-dev.txt; add clang-format-14.
… vector data in adjoint-diff.
…calls into two templated methods. Call specialized expval methods when possible. Remove obsolete 'Apply directly' tests.
…alueMultiQubitOpFunctor.
I'm unsure when we'll have a runner with CUDA-12 (and not sure we'll have any runner with HIP-capable devices any time soon), so could we move forward with this PR nevertheless? |
yea, sounds good to me! |
..._lightning/core/src/simulators/lightning_kokkos/gates/tests/Test_StateVectorKokkos_Param.cpp
Outdated
Show resolved
Hide resolved
pennylane_lightning/core/src/simulators/lightning_kokkos/StateVectorKokkos.hpp
Show resolved
Hide resolved
pennylane_lightning/core/src/simulators/lightning_kokkos/StateVectorKokkos.hpp
Show resolved
Hide resolved
…/tests/Test_StateVectorKokkos_Param.cpp Co-authored-by: Lee James O'Riordan <[email protected]>
pennylane_lightning/core/src/simulators/lightning_kokkos/gates/GateFunctorsParam.hpp
Outdated
Show resolved
Hide resolved
pennylane_lightning/core/src/simulators/lightning_kokkos/gates/GateFunctorsParam.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Vincent, Nice work!
Thank you for that!
pennylane_lightning/core/src/simulators/lightning_kokkos/measurements/ExpValFunctors.hpp
Show resolved
Hide resolved
pennylane_lightning/core/src/simulators/lightning_kokkos/measurements/ExpValFunctors.hpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, thank you for the nice job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vincentmr
Before submitting
Please complete the following checklist when submitting a PR:
All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
tests
directory!All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running
make docs
.Ensure that the test suite passes, by running
make test
.Add a new entry to the
.github/CHANGELOG.md
file, summarizing thechange, and including a link back to the PR.
Ensure that code is properly formatted by running
make format
.When all the above are checked, delete everything above the dashed
line and fill in the pull request template.
Context:
This PR is a follow-up on #489 . The general scheme for multi-qubit gates uses three layers of parallelism with team policies. This introduces several parameters which should be tuned for optimal performance, but are currently left to Kokkos' heuristics to decide. On the other hand, the straightforward range policy-based scheme of the 1- and 2-qubit kernels outperforms the general scheme significantly.
I introduce specialized 3- to 5-qubit kernels. I draw the following conclusion:
The following figures show timings to apply a
QubitUnitary
for OPENMP, CUDA and HIP respectively.Description of the Change:
Introduce specialized 3- to 5-qubit unitary gate kernels. Refactor
applyMultiQubitOp
wrapper inStateVectorKokkos.hpp
. Functors are not templated oninverse
anymore, taking the conjugate-transpose once and for all upon enteringapplyMultiQubitOp
instead of on-the-fly for each element of the for loop. Add few tests.Benefits:
Faster
QubitUnitary
, especially for 3+-qubit observables on GPU-devices.Possible Drawbacks:
None
Related GitHub Issues: