Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize obs with pauli reps into hamiltonians #424

Merged
merged 26 commits into from
Apr 20, 2023
Merged

Conversation

timmysilv
Copy link
Contributor

@timmysilv timmysilv commented Mar 1, 2023

Context:
Arithmetic operators are the latest craze in PennyLane, and lightning should support them natively as well! If an observable (that isn't a Tensor or Hamiltonian) has a representation in the Pauli basis, it will be saved on the Operator. Those representations can easily be converted into what is passed to the existing C++ bindings for PennyLane Hamiltonians.

Description of the Change:

  • Add serializing helpers that convert PauliSentence objects into HamiltonianCxx lightning classes
  • Use this serialization for any observable that meets 2 criteria:
    1. not a Tensor, Hamiltonian, or otherwise already has some kernel (Hadamard or Pauli op)
    2. has a non-None _pauli_rep assigned to it
  • Clean up the serialization decision tree: _obs_has_kernel did some consideration for Hamiltonians and Tensors that wasn't used because it was only called on things we knew were neither of those. The new flow should be simpler and more readable, and it lends itself better to op-arithmetic support

Benefits:

  • qml.prod and other arithmetic operators were previously defaulting to qml.matrix(op), and that is not efficient. With this PR, they are being serialized (via Hamiltonian for now) and computed with that C++ kernel. This change supports all operators with a _pauli_rep attribute.
  • I did some basic benchmarking measuring the expectation of large Sum ops (also in a QNode with adjoint-diff) and it takes the same time as Hamiltonians - see latest graph below

Possible Drawbacks:
This won't cover all cases. For example, consider the following:

>>> pauli_hadamard = qml.s_prod((1/np.sqrt(2)), qml.sum(qml.PauliX(1), qml.PauliZ(1)))
>>> all_pauli = qml.prod(qml.PauliX(0), pauli_hadamard)
>>> uses_hadamard = qml.prod(qml.PauliX(0), qml.Hadamard(1))
>>> print(all_pauli._pauli_rep is None, uses_hadamard._pauli_rep is None)
(False, True)

all_pauli and uses_hadamard are the same, and are valid observables, but only the former will be serialized. This is most unfortunate because there are kernels for Pauli ops and Hadamard, but I don't cover that with this PR. Note that as a sideways fix to this, I propose a Pauli representation for Hadamard in another PR (linked below). EDIT: this is a larger problem with op-arithmetic (should also consider qml.Hermitian) and needs more work on the PL side.

@codecov
Copy link

codecov bot commented Mar 1, 2023

Codecov Report

Merging #424 (be2e166) into master (158b0dc) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #424   +/-   ##
=======================================
  Coverage   99.82%   99.82%           
=======================================
  Files          50       50           
  Lines        4635     4654   +19     
=======================================
+ Hits         4627     4646   +19     
  Misses          8        8           
Impacted Files Coverage Δ
pennylane_lightning/_serialize.py 100.00% <100.00%> (ø)
pennylane_lightning/_version.py 100.00% <100.00%> (ø)
pennylane_lightning/lightning_qubit.py 99.14% <100.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@timmysilv timmysilv marked this pull request as draft March 6, 2023 14:58
@timmysilv timmysilv requested review from albi3ro and a team March 6, 2023 20:08
@timmysilv timmysilv marked this pull request as ready for review March 6, 2023 20:08
Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @timmysilv. Nice job here!
Could you please add some benchmarks to this PR showing how these changes impact performance?
Also, you need to update the changelog.

@timmysilv
Copy link
Contributor Author

[sc-22425]

@timmysilv
Copy link
Contributor Author

timmysilv commented Mar 10, 2023

I generated random Pauli sentences for 1-8 wires with a reasonable number of terms in each, then converted each into an old-style Hamiltonian, as well as a new-style arithmetic op (with a pauli rep). I also downloaded Hamiltonians for H6, H2O and BH3 (12, 14, 16 qubits, respectively) and stuck those into my benchmark data. I ran dev.expval(obs) 100 times for both types, and I ran a basic QNode with diff_method=adjoint that does RX(np.array(1.1), 0) then returns dev.expval(obs) 20 times for both types.

The two observable types are at near-parity, with some solid boosts using the new operator type! I'll note that occasionally, I saw the old-school observable winning by microseconds in very small systems (1-2 qubits, a handful of terms) but I think that isn't very significant.

Please note that the data shown below is plotted with a log scale, meaning that the new observable type does significantly better in larger systems.

Also, the performance boost is likely due to pre-processing. Hamiltonians will use hamiltonian_expand(group=False) to split tapes, whereas Sum observables will use sum_expand. Given that the observables should be identical other than type/python representation, this makes the most sense. Example with my code (16-qubit Hamiltonian):

>>> ham_circuit = qml.tape.QuantumScript(measurements=[qml.expval(hams[-1])])
>>> op_circuit = qml.tape.QuantumScript(measurements=[qml.expval(ops[-1])])
>>> ham_tapes, _ = qml.transforms.hamiltonian_expand(ham_circuit, group=False)
>>> op_tapes, _ = qml.transforms.sum_expand(op_circuit)
>>> len(ham_tapes), len(op_tapes)
(771, 433)

image

PS: I converted my notebook to a python file for your viewing (if you're curious about what I did). I can't upload .py or .ipynb files to a PR, so I made a gist!

@AmintorDusko
Copy link
Contributor

I generated random Pauli sentences for 1-8 wires with a reasonable number of terms in each, then converted each into an old-style Hamiltonian, as well as a new-style arithmetic op (with a pauli rep). I also downloaded Hamiltonians for H6, H2O and BH3 (12, 14, 16 qubits, respectively) and stuck those into my benchmark data. I ran dev.expval(obs) 100 times for both types, and I ran a basic QNode with diff_method=adjoint that does RX(np.array(1.1), 0) then returns dev.expval(obs) 20 times for both types.

The two observable types are at near-parity, with some solid boosts using the new operator type! I'll note that occasionally, I saw the old-school observable winning by microseconds in very small systems (1-2 qubits, a handful of terms) but I think that isn't very significant.

Please note that the data shown below is plotted with a log scale, meaning that the new observable type does significantly better in larger systems.

Also, the performance boost is likely due to pre-processing. Hamiltonians will use hamiltonian_expand(group=False) to split tapes, whereas Sum observables will use sum_expand. Given that the observables should be identical other than type/python representation, this makes the most sense. Example with my code (16-qubit Hamiltonian):

>>> ham_circuit = qml.tape.QuantumScript(measurements=[qml.expval(hams[-1])])
>>> op_circuit = qml.tape.QuantumScript(measurements=[qml.expval(ops[-1])])
>>> ham_tapes, _ = qml.transforms.hamiltonian_expand(ham_circuit, group=False)
>>> op_tapes, _ = qml.transforms.sum_expand(op_circuit)
>>> len(ham_tapes), len(op_tapes)
(771, 433)

image

PS: I converted my notebook to a python file for your viewing (if you're curious about what I did). I can't upload .py or .ipynb files to a PR, so I made a gist!

Hi @timmysilv. Thank you for the benchmarks! I'm happy to see the performance improvements.

@timmysilv
Copy link
Contributor Author

timmysilv commented Mar 10, 2023

This needs more work. I just ran some validation and it makes an actual difference with results. I'm not sure why, but this big slow molecule demonstrates it:

import pennylane as qml
from timeit import default_timer as timer

ham = qml.data.load("qchem", molname="BH3", basis="STO-3G", bondlength=0.86)[0].hamiltonian
op = qml.pauli.pauli_sentence(ham).operation()

dev = qml.device("lightning.qubit", op.wires)
@qml.qnode(dev)
def c_ham():
    return qml.expval(ham)

@qml.qnode(dev)
def c_op():
    return qml.expval(op)

expectation with hamiltonian is 10.295583513349388, but with op it's 9.790384688804833. This shouldn't be related to the Identity change because that isn't used in non-adjoint expectations. How to convince myself that the observables are indeed equivalent... will try on Monday.

EDIT: it's related to how they are split:

>>> dev.expval(ham), dev.expval(op)
(10.295583513349388, 10.295583513349388)

@timmysilv timmysilv marked this pull request as draft March 10, 2023 22:46
@timmysilv timmysilv changed the title Serialize obs with pauli reps into hamiltonians [WIP] Serialize obs with pauli reps into hamiltonians Mar 10, 2023
@AmintorDusko
Copy link
Contributor

This needs more work. I just ran some validation and it makes an actual difference with results. I'm not sure why, but this big slow molecule demonstrates it:

import pennylane as qml
from timeit import default_timer as timer

ham = qml.data.load("qchem", molname="BH3", basis="STO-3G", bondlength=0.86)[0].hamiltonian
op = qml.pauli.pauli_sentence(ham).operation()

dev = qml.device("lightning.qubit", op.wires)
@qml.qnode(dev)
def c_ham():
    return qml.expval(ham)

@qml.qnode(dev)
def c_op():
    return qml.expval(op)

expectation with hamiltonian is 10.295583513349388, but with op it's 9.790384688804833. This shouldn't be related to the Identity change because that isn't used in non-adjoint expectations. How to convince myself that the observables are indeed equivalent... will try on Monday.

EDIT: it's related to how they are split:

>>> dev.expval(ham), dev.expval(op)
(10.295583513349388, 10.295583513349388)

Hmm. You are right. There is something weird here.
I tested some smaller Hamiltonians, with your script.
I observed that we have some matches and some mismatches.
molname="H2", basis="STO-3G", bondlength=1.1 - match
molname="HeH+", basis="STO-3G", bondlength=1.1 - mismatch
molname="H3+", basis="STO-3G", bondlength=1.1 - mismatch
molname="H4", basis="STO-3G", bondlength=1.1 - match
molname="LiH", basis="STO-3G", bondlength=1.11 - mismatch

I hope this can help you map out the problem.

@timmysilv
Copy link
Contributor Author

ok, I've done a bunch of little changes locally, and this is now looking a lot better. I will re-open this as officially ready for review when the relevant fixes are in PennyLane itself, but the associated problems have been identified and are being addressed. New results:

image

@mlxd
Copy link
Member

mlxd commented Mar 15, 2023

ok, I've done a bunch of little changes locally, and this is now looking a lot better. I will re-open this as officially ready for review when the relevant fixes are in PennyLane itself, but the associated problems have been identified and are being addressed. New results:

image

Thanks @timmysilv
One question --- does the performance order invert beyond the 16 qubit count, or do they follow each other along the same curve? If this slows down beyond the number of samples taken, we should run a larger sim and push things into the 20 qubit regime to be sure.

@timmysilv
Copy link
Contributor Author

timmysilv commented Mar 15, 2023

I was wondering the same thing, so fortunately I have the answer for you! I just got Utkarsh to give me a 20-qubit, 2951-term Hamiltonian for Li2 and they remain neck and neck even up there (ran a few times locally, they seem pretty much equal. Sums win by up to 5%). I also tried with a 24-qubit, 30k-term Hamiltonian but my poor old MacBook Air just couldn't do it, so you'd need to ask a more powerful computer to confirm what happens beyond 20.

@timmysilv
Copy link
Contributor Author

timmysilv commented Mar 15, 2023

PRs (and follow-up work) needed for this to be ready:

(My benchmarking was done with all these changes present in my local environment)

@mlxd
Copy link
Member

mlxd commented Mar 16, 2023

I was wondering the same thing, so fortunately I have the answer for you! I just got Utkarsh to give me a 20-qubit, 2951-term Hamiltonian for Li2 and they remain neck and neck even up there (ran a few times locally, they seem pretty much equal. Sums win by up to 5%). I also tried with a 24-qubit, 30k-term Hamiltonian but my poor old MacBook Air just couldn't do it, so you'd need to ask a more powerful computer to confirm what happens beyond 20.

Thanks @timmysilv . I think given the agreement at 20, the bottleneck is likely no longer the routines modified here, and may instead be other parts of the pipeline. I'd say that's fine for now.

@timmysilv
Copy link
Contributor Author

re: the above conversation, I've changed QubitDevice in PennyLane to provide a hook called _get_diagonalizing_gates, and I am now overriding it in this PR to filter out any Sum observables. This means that the above example which took 4 minutes computing diagonalizing gates and 15 seconds getting/setting parameters will now only take 15 seconds! Sure, the Hamiltonian equivalent still only takes 0.05 seconds, but we're getting there.

@timmysilv
Copy link
Contributor Author

this is ready for review again! my latest benchmarks show the old and new operators taking the same amount of time

@chaeyeunpark
Copy link
Contributor

Not directly relevant to this PR, but @albi3ro @mlxd, do we have specifications for supporting measurements in the new device API?

Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing more to add from my side.
Thank you for that!

Co-authored-by: Amintor Dusko <[email protected]>
@mlxd mlxd self-requested a review April 19, 2023 18:43
Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @timmysilv
I have some questions and some suggested updates.

Also, since this work will likely need to be ported to L-GPU and L-Kokkos also, it may be worth planning that out too so that the packages stay in-line.

Copy link
Contributor Author

@timmysilv timmysilv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll prepare the changes for kokkos and gpu

Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work @timmysilv
Happy to approve.

@timmysilv timmysilv merged commit 769997c into master Apr 20, 2023
@timmysilv timmysilv deleted the obs-with-pauli-rep branch April 20, 2023 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants