Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the input state-vector copy into the LGPU #1071

Merged
merged 28 commits into from
Mar 5, 2025

Conversation

LuisAlfredoNu
Copy link
Contributor

@LuisAlfredoNu LuisAlfredoNu commented Feb 27, 2025

Context:
After running different algorithm with LGPU and perform a memory profile. Show a memory bottleneck for LGPU on the Python layer because the peak of memory is 3 times the need for the computation.

image

Description of the Change:
Remove tmp allocation and skip indexes computation for common cases.

  • Remove temporal GPU allocation for input values and indexes.
  • The input state vector is copied directly from the host if the target wires are contiguous and start in the most/least significant wires (which are the most common cases).
  • In the case of custom target wires, LGPU follow the previous algorithm but with a speedup in the index computation thought parallel computing with OpenMP

Benefits:
Using a test algorithm with 31 qubits produce the following memory profile:
newplot(3)

Reduction of the memory peak from 100GB to 66GB

Note: memray measures all the memory allocation, even for the GPU cudaMallocX.

Using the following toy circuit

   state_init =  random_normalize_sv(wires-1)
   target_wires = wires[:-1]
   dev = qml.device("lightning.gpu", wires=wires)

    def circuit():
        qml.StatePrep(input_state, wires=target_wires)
                
        return qml.expval(qml.PauliZ(0))

Produce the following times

image

image

Possible Drawbacks:

Related GitHub Issues:
[sc-58833]

Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

Attention: Patch coverage is 96.59091% with 3 lines in your changes missing coverage. Please review.

Project coverage is 98.11%. Comparing base (109db9f) to head (901be60).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...lightning/core/src/utils/cuda_utils/DataBuffer.hpp 83.33% 2 Missing ⚠️
pennylane_lightning/lightning_gpu/_state_vector.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1071      +/-   ##
==========================================
+ Coverage   97.99%   98.11%   +0.12%     
==========================================
  Files         233      232       -1     
  Lines       40019    39268     -751     
==========================================
- Hits        39215    38527     -688     
+ Misses        804      741      -63     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LuisAlfredoNu LuisAlfredoNu marked this pull request as ready for review February 27, 2025 22:24
@LuisAlfredoNu LuisAlfredoNu added the ci:use-gpu-runner Enable usage of GPU runner for this Pull Request label Feb 27, 2025
Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! A great improvement for memory usage. I assume you checked the codecov warning and they are all fake, right?

Copy link
Member

@multiphaseCFD multiphaseCFD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one. Thanks @LuisAlfredoNu

Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your nice job!

Copy link
Member

@maliasadi maliasadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Thanks @LuisAlfredoNu 🥇

Don't forget to add your PR to the changelog :)

Copy link
Member

@maliasadi maliasadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to approve 🥳

@LuisAlfredoNu LuisAlfredoNu merged commit 9e26a91 into master Mar 5, 2025
91 of 92 checks passed
@LuisAlfredoNu LuisAlfredoNu deleted the optimize_memory_lgpu branch March 5, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:use-gpu-runner Enable usage of GPU runner for this Pull Request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants