Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate multi-qubit gates #490

Merged
merged 81 commits into from
Sep 8, 2023
Merged

Accelerate multi-qubit gates #490

merged 81 commits into from
Sep 8, 2023

Conversation

vincentmr
Copy link
Contributor

@vincentmr vincentmr commented Aug 29, 2023

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    tests directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the .github/CHANGELOG.md file, summarizing the
    change, and including a link back to the PR.

  • Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context:
This PR is a follow-up on #489 . The general scheme for multi-qubit gates uses three layers of parallelism with team policies. This introduces several parameters which should be tuned for optimal performance, but are currently left to Kokkos' heuristics to decide. On the other hand, the straightforward range policy-based scheme of the 1- and 2-qubit kernels outperforms the general scheme significantly.

I introduce specialized 3- to 5-qubit kernels. I draw the following conclusion:

  • Range-policy kernels have the same performance as the team-policy one up to 4-qubits on the OPENMP backend and are then slower.
  • Range-policy kernels are faster than the team-policy one up to at least 5-qubits on the CUDA and HIP backends.

The following figures show timings to apply a QubitUnitary for OPENMP, CUDA and HIP respectively.

benchmarks_CPU
benchmarks_CUDA
benchmarks_HIP

Description of the Change:
Introduce specialized 3- to 5-qubit unitary gate kernels. Refactor applyMultiQubitOp wrapper in StateVectorKokkos.hpp. Functors are not templated on inverse anymore, taking the conjugate-transpose once and for all upon entering applyMultiQubitOp instead of on-the-fly for each element of the for loop. Add few tests.

Benefits:
Faster QubitUnitary, especially for 3+-qubit observables on GPU-devices.

Possible Drawbacks:
None

Related GitHub Issues:

vincentmr and others added 30 commits August 21, 2023 10:52
…ata` to work with devices.

M  pennylane_lightning/core/src/simulators/lightning_kokkos/StateVectorKokkos.hpp; `applyMatrix` bugfix: use intermediate hostview to copy matrix data; same bugfix for `getDataVector`.
M  pennylane_lightning/core/src/simulators/lightning_kokkos/algorithms/AdjointJacobianKokkos.hpp; use copy constructor.
M  pennylane_lightning/core/src/simulators/lightning_kokkos/measurements/MeasurementsKokkos.hpp; use copy constructor.
M  pennylane_lightning/core/src/simulators/lightning_kokkos/observables/ObservablesKokkos.hpp; use copy constructor.
M  requirements-dev.txt; add clang-format-14.
…calls into two templated methods. Call specialized expval methods when possible. Remove obsolete 'Apply directly' tests.
@vincentmr
Copy link
Contributor Author

Will come back to have a look once the GPU CI is added.

I'm unsure when we'll have a runner with CUDA-12 (and not sure we'll have any runner with HIP-capable devices any time soon), so could we move forward with this PR nevertheless?

@multiphaseCFD
Copy link
Member

Will come back to have a look once the GPU CI is added.

I'm unsure when we'll have a runner with CUDA-12 (and not sure we'll have any runner with HIP-capable devices any time soon), so could we move forward with this PR nevertheless?

yea, sounds good to me!

Base automatically changed from template/expval to master September 7, 2023 14:35
@vincentmr vincentmr requested a review from mlxd September 7, 2023 19:32
AmintorDusko
AmintorDusko previously approved these changes Sep 8, 2023
Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Vincent, Nice work!
Thank you for that!

Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, thank you for the nice job!

Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vincentmr

@vincentmr vincentmr merged commit 3fbd4ce into master Sep 8, 2023
@vincentmr vincentmr deleted the accel/mqgate_tmp branch September 8, 2023 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants