Skip to content

Commit de7beb7

Browse files
vincentmrringo-but-quantummaliasadi
authored
Add sample(wires) support in LightningQubit (#809)
### Before submitting Please complete the following checklist when submitting a PR: - [x] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the [`tests`](../tests) directory! - [x] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [x] Ensure that the test suite passes, by running `make test`. - [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing the change, and including a link back to the PR. - [x] Ensure that code is properly formatted by running `make format`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** `sample` call `generate_samples` which computes the full probabilities and uses the alias method to generate samples for all wires. This is wasteful whenever samples are required only for a subset of all wires. **Description of the Change:** Move alias method logic to the `discrete_random_variable` class. **Benefits:** Compute minimal probs and samples. We benchmark the current changes against `master`, which already benefits from some good speed-ups introduce in #795 and #800 . We use ISAIC's AMD EPYC-Milan Processor using a single core/thread. The times are obtained using at least 5 experiments and running for at least 250 milliseconds. We begin comparing `master`'s `generate_samples(num_samples)` with ours `generate_samples({0}, num_samples)`. For 4-12 qubits, overheads dominate the calculation (the absolute times range from 6 microseconds to 18 milliseconds, which is not a lot. Already at 12 qubits however, a trend appears where our implementation is significantly faster. This is to be expected for two reason: `probs(wires)` itself is faster than `probs()` (for enough qubits) and `sample(wires)` starts also requiring significantly less work than `sample()`. ![speedup_vs_nthreads_1w](https://github.com/user-attachments/assets/472748e9-d812-489c-a00f-2b2b74c7e682) Next we turn to comparing `master`'s `generate_samples(num_samples)` with ours `generate_samples({0..num_qubits/2}, num_samples)`. The situation there is similar, with speed-ups close to 1 for the smaller qubit counts and (sometimes) beyond 20x for qubit counts above 20. ![speedup_vs_nthreads_hfw](https://github.com/user-attachments/assets/f39e3ccd-8051-4a57-a857-9cd13f547865) Finally we compare `master`'s `generate_samples(num_samples)` with ours `generate_samples({0..num_qubits-1}, num_samples)` (i.e. computing samples on all wires). We expect similar performance since the main difference comes from the caching mechanism in `master`'s discrete random variable generator. The data suggests caching samples is counter-productive compared with calculating the sample values on the fly. ![speedup_vs_nthreads_fullw](https://github.com/user-attachments/assets/2c70ed21-2236-479e-be3d-6017b42fdc5e) Turning OMP ON, using 16 threads and comparing `master`'s `generate_samples(num_samples)` with ours `generate_samples({0}, num_samples)` we get good speed-ups above 12 qubits. Below that the overhead of spawning threads isn't repaid, but absolute times remain low. ![speedup_vs_omp16_1w](https://github.com/user-attachments/assets/e3e90a55-399f-4a5b-b90e-7059a0486228) **Possible Drawbacks:** **Related GitHub Issues:** [sc-65127] --------- Co-authored-by: ringo-but-quantum <[email protected]> Co-authored-by: Ali Asadi <[email protected]>
1 parent 9a83ff4 commit de7beb7

File tree

9 files changed

+180
-148
lines changed

9 files changed

+180
-148
lines changed

.github/CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@
1515

1616
### Improvements
1717

18+
* Add `generate_samples(wires)` support in LightningQubit.
19+
[(#809)](https://github.com/PennyLaneAI/pennylane-lightning/pull/809)
20+
1821
* Optimize the OpenMP parallelization of Lightning-Qubit's `probs` for all number of targets.
1922
[(#807)](https://github.com/PennyLaneAI/pennylane-lightning/pull/807)
2023

pennylane_lightning/core/_version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@
1616
Version number (major.minor.patch[-label])
1717
"""
1818

19-
__version__ = "0.38.0-dev12"
19+
__version__ = "0.38.0-dev13"

pennylane_lightning/core/src/simulators/lightning_qubit/bindings/LQubitBindings.hpp

+21
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,27 @@ void registerBackendSpecificMeasurements(PyClass &pyclass) {
292292
static_cast<sparse_index_type>(values.request().size));
293293
},
294294
"Variance of a sparse Hamiltonian.")
295+
.def("generate_samples",
296+
[](Measurements<StateVectorT> &M,
297+
const std::vector<std::size_t> &wires,
298+
const std::size_t num_shots) {
299+
const std::size_t num_wires = wires.size();
300+
auto &&result = M.generate_samples(wires, num_shots);
301+
const std::size_t ndim = 2;
302+
const std::vector<std::size_t> shape{num_shots, num_wires};
303+
constexpr auto sz = sizeof(std::size_t);
304+
const std::vector<std::size_t> strides{sz * num_wires, sz};
305+
// return 2-D NumPy array
306+
return py::array(py::buffer_info(
307+
result.data(), /* data as contiguous array */
308+
sz, /* size of one scalar */
309+
py::format_descriptor<std::size_t>::format(), /* data type
310+
*/
311+
ndim, /* number of dimensions */
312+
shape, /* shape of the matrix */
313+
strides /* strides for each axis */
314+
));
315+
})
295316
.def("generate_mcmc_samples", [](Measurements<StateVectorT> &M,
296317
std::size_t num_wires,
297318
const std::string &kernelname,

pennylane_lightning/core/src/simulators/lightning_qubit/measurements/MeasurementKernels.hpp

+83
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@
1919
#pragma once
2020

2121
#include <complex>
22+
#include <random>
23+
#include <stack>
24+
#include <utility>
2225
#include <vector>
2326

2427
#include "BitUtil.hpp"
@@ -296,6 +299,86 @@ namespace PUtil = Pennylane::Util;
296299

297300
namespace Pennylane::LightningQubit::Measures {
298301

302+
/**
303+
* @brief Generate samples using the alias method.
304+
*
305+
* @tparam PrecisionT Precision data type
306+
*/
307+
template <typename PrecisionT> class DiscreteRandomVariable {
308+
private:
309+
static constexpr std::size_t default_index =
310+
std::numeric_limits<std::size_t>::max();
311+
std::mt19937 &gen_;
312+
const std::size_t n_probs_;
313+
const std::vector<std::pair<double, std::size_t>> bucket_partners_;
314+
mutable std::uniform_real_distribution<PrecisionT> distribution_{0.0, 1.0};
315+
316+
public:
317+
/**
318+
* @brief Create a DiscreteRandomVariable object.
319+
*
320+
* @param gen Random number generator reference.
321+
* @param probs Probabilities for values 0 up to N - 1, where N =
322+
* probs.size().
323+
*/
324+
DiscreteRandomVariable(std::mt19937 &gen,
325+
const std::vector<PrecisionT> &probs)
326+
: gen_{gen}, n_probs_{probs.size()},
327+
bucket_partners_(init_bucket_partners_(probs)) {}
328+
329+
/**
330+
* @brief Return a discrete random value.
331+
*/
332+
std::size_t operator()() const {
333+
const auto idx =
334+
static_cast<std::size_t>(distribution_(gen_) * n_probs_);
335+
if (distribution_(gen_) >= bucket_partners_[idx].first &&
336+
bucket_partners_[idx].second != default_index) {
337+
return bucket_partners_[idx].second;
338+
}
339+
return idx;
340+
}
341+
342+
private:
343+
/**
344+
* @brief Initialize the probability table of the alias method.
345+
*/
346+
std::vector<std::pair<double, std::size_t>>
347+
init_bucket_partners_(const std::vector<PrecisionT> &probs) {
348+
std::vector<std::pair<double, std::size_t>> bucket_partners(
349+
n_probs_, {0.0, default_index});
350+
std::stack<std::size_t> underfull_bucket_ids;
351+
std::stack<std::size_t> overfull_bucket_ids;
352+
353+
for (std::size_t i = 0; i < n_probs_; i++) {
354+
bucket_partners[i].first = n_probs_ * probs[i];
355+
if (bucket_partners[i].first < 1.0) {
356+
underfull_bucket_ids.push(i);
357+
} else {
358+
overfull_bucket_ids.push(i);
359+
}
360+
}
361+
362+
while (!underfull_bucket_ids.empty() && !overfull_bucket_ids.empty()) {
363+
auto i = overfull_bucket_ids.top();
364+
overfull_bucket_ids.pop();
365+
auto j = underfull_bucket_ids.top();
366+
underfull_bucket_ids.pop();
367+
368+
bucket_partners[j].second = i;
369+
bucket_partners[i].first += bucket_partners[j].first - 1.0;
370+
371+
if (bucket_partners[i].first < 1.0) {
372+
underfull_bucket_ids.push(i);
373+
} else {
374+
overfull_bucket_ids.push(i);
375+
}
376+
}
377+
378+
return bucket_partners;
379+
}
380+
};
381+
299382
/**
300383
* @brief Probabilities for a subset of the full system.
301384
*

pennylane_lightning/core/src/simulators/lightning_qubit/measurements/MeasurementsLQubit.hpp

+37-81
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@
2424
#include <complex>
2525
#include <cstdio>
2626
#include <random>
27-
#include <stack>
2827
#include <type_traits>
2928
#include <unordered_map>
3029
#include <vector>
@@ -124,7 +123,7 @@ class Measurements final
124123

125124
const ComplexT *arr_data = this->_statevector.getData();
126125

127-
// Templated 1-4 wire cases; return probs
126+
// Templated 1-4 wire cases; return probs
128127
PROBS_SPECIAL_CASE(1);
129128
PROBS_SPECIAL_CASE(2);
130129
PROBS_SPECIAL_CASE(3);
@@ -175,7 +174,7 @@ class Measurements final
175174
*
176175
* @return Floating point std::vector with probabilities.
177176
*/
178-
auto probs(size_t num_shots) -> std::vector<PrecisionT> {
177+
auto probs(std::size_t num_shots) -> std::vector<PrecisionT> {
179178
return BaseType::probs(num_shots);
180179
}
181180

@@ -258,7 +257,7 @@ class Measurements final
258257
const index_type *entries_ptr, const ComplexT *values_ptr,
259258
const index_type numNNZ) -> PrecisionT {
260259
PL_ABORT_IF(
261-
(this->_statevector.getLength() != (size_t(row_map_size) - 1)),
260+
(this->_statevector.getLength() != (std::size_t(row_map_size) - 1)),
262261
"Statevector and Hamiltonian have incompatible sizes.");
263262
auto operator_vector = Util::apply_Sparse_Matrix(
264263
this->_statevector.getData(),
@@ -289,7 +288,7 @@ class Measurements final
289288
"The lengths of the list of operations and wires do not match.");
290289
std::vector<PrecisionT> expected_value_list;
291290

292-
for (size_t index = 0; index < operations_list.size(); index++) {
291+
for (std::size_t index = 0; index < operations_list.size(); index++) {
293292
expected_value_list.emplace_back(
294293
expval(operations_list[index], wires_list[index]));
295294
}
@@ -461,7 +460,7 @@ class Measurements final
461460

462461
std::vector<PrecisionT> expected_value_list;
463462

464-
for (size_t index = 0; index < operations_list.size(); index++) {
463+
for (std::size_t index = 0; index < operations_list.size(); index++) {
465464
expected_value_list.emplace_back(
466465
var(operations_list[index], wires_list[index]));
467466
}
@@ -488,7 +487,7 @@ class Measurements final
488487
std::size_t num_qubits = this->_statevector.getNumQubits();
489488
std::uniform_real_distribution<PrecisionT> distrib(0.0, 1.0);
490489
std::vector<std::size_t> samples(num_samples * num_qubits, 0);
491-
std::unordered_map<size_t, std::size_t> cache;
490+
std::unordered_map<std::size_t, std::size_t> cache;
492491
this->setRandomSeed();
493492

494493
TransitionKernelType transition_kernel = TransitionKernelType::Local;
@@ -502,13 +501,13 @@ class Measurements final
502501
std::size_t idx = 0;
503502

504503
// Burn In
505-
for (size_t i = 0; i < num_burnin; i++) {
504+
for (std::size_t i = 0; i < num_burnin; i++) {
506505
idx = metropolis_step(this->_statevector, tk, this->rng, distrib,
507506
idx); // Burn-in.
508507
}
509508

510509
// Sample
511-
for (size_t i = 0; i < num_samples; i++) {
510+
for (std::size_t i = 0; i < num_samples; i++) {
512511
idx = metropolis_step(this->_statevector, tk, this->rng, distrib,
513512
idx);
514513

@@ -521,7 +520,7 @@ class Measurements final
521520

522521
// If not cached, compute
523522
else {
524-
for (size_t j = 0; j < num_qubits; j++) {
523+
for (std::size_t j = 0; j < num_qubits; j++) {
525524
samples[i * num_qubits + (num_qubits - 1 - j)] =
526525
(idx >> j) & 1U;
527526
}
@@ -551,7 +550,7 @@ class Measurements final
551550
const index_type *entries_ptr, const ComplexT *values_ptr,
552551
const index_type numNNZ) {
553552
PL_ABORT_IF(
554-
(this->_statevector.getLength() != (size_t(row_map_size) - 1)),
553+
(this->_statevector.getLength() != (std::size_t(row_map_size) - 1)),
555554
"Statevector and Hamiltonian have incompatible sizes.");
556555
auto operator_vector = Util::apply_Sparse_Matrix(
557556
this->_statevector.getData(),
@@ -577,79 +576,36 @@ class Measurements final
577576
* @return 1-D vector of samples in binary, each sample is
578577
* separated by a stride equal to the number of qubits.
579578
*/
580-
std::vector<std::size_t> generate_samples(size_t num_samples) {
579+
std::vector<std::size_t> generate_samples(const std::size_t num_samples) {
581580
const std::size_t num_qubits = this->_statevector.getNumQubits();
582-
auto &&probabilities = probs();
581+
std::vector<std::size_t> wires(num_qubits);
582+
std::iota(wires.begin(), wires.end(), 0);
583+
return generate_samples(wires, num_samples);
584+
}
583585

584-
std::vector<std::size_t> samples(num_samples * num_qubits, 0);
585-
std::uniform_real_distribution<PrecisionT> distribution(0.0, 1.0);
586-
std::unordered_map<size_t, std::size_t> cache;
586+
/**
587+
* @brief Generate samples.
588+
*
589+
* @param wires Sample are generated for the specified wires.
590+
* @param num_samples The number of samples to generate.
591+
* @return 1-D vector of samples in binary, each sample is
592+
* separated by a stride equal to the number of qubits.
593+
*/
594+
std::vector<std::size_t>
595+
generate_samples(const std::vector<std::size_t> &wires,
596+
const std::size_t num_samples) {
597+
const std::size_t n_wires = wires.size();
598+
std::vector<std::size_t> samples(num_samples * n_wires);
587599
this->setRandomSeed();
588-
589-
const std::size_t N = probabilities.size();
590-
std::vector<double> bucket(N);
591-
std::vector<std::size_t> bucket_partner(N);
592-
std::stack<std::size_t> overfull_bucket_ids;
593-
std::stack<std::size_t> underfull_bucket_ids;
594-
595-
for (size_t i = 0; i < N; i++) {
596-
bucket[i] = N * probabilities[i];
597-
bucket_partner[i] = i;
598-
if (bucket[i] > 1.0) {
599-
overfull_bucket_ids.push(i);
600-
}
601-
if (bucket[i] < 1.0) {
602-
underfull_bucket_ids.push(i);
603-
}
604-
}
605-
606-
// Run alias algorithm
607-
while (!underfull_bucket_ids.empty() && !overfull_bucket_ids.empty()) {
608-
// get an overfull bucket
609-
std::size_t i = overfull_bucket_ids.top();
610-
611-
// get an underfull bucket
612-
std::size_t j = underfull_bucket_ids.top();
613-
underfull_bucket_ids.pop();
614-
615-
// underfull bucket is partned with an overfull bucket
616-
bucket_partner[j] = i;
617-
bucket[i] = bucket[i] + bucket[j] - 1;
618-
619-
// if overfull bucket is now underfull
620-
// put in underfull stack
621-
if (bucket[i] < 1) {
622-
overfull_bucket_ids.pop();
623-
underfull_bucket_ids.push(i);
624-
}
625-
626-
// if overfull bucket is full -> remove
627-
else if (bucket[i] == 1.0) {
628-
overfull_bucket_ids.pop();
629-
}
630-
}
631-
632-
// Pick samples
633-
for (size_t i = 0; i < num_samples; i++) {
634-
PrecisionT pct = distribution(this->rng) * N;
635-
auto idx = static_cast<std::size_t>(pct);
636-
if (pct - idx > bucket[idx]) {
637-
idx = bucket_partner[idx];
638-
}
639-
// If cached, retrieve sample from cache
640-
if (cache.contains(idx)) {
641-
std::size_t cache_id = cache[idx];
642-
auto it_temp = samples.begin() + cache_id * num_qubits;
643-
std::copy(it_temp, it_temp + num_qubits,
644-
samples.begin() + i * num_qubits);
645-
}
646-
// If not cached, compute
647-
else {
648-
for (size_t j = 0; j < num_qubits; j++) {
649-
samples[i * num_qubits + (num_qubits - 1 - j)] =
650-
(idx >> j) & 1U;
651-
}
652-
cache[idx] = i;
600+
DiscreteRandomVariable<PrecisionT> drv{this->rng, probs(wires)};
601+
// The Python layer expects a 2D array with dimensions (n_samples x
602+
// n_wires) and hence the linear index is `s * n_wires + (n_wires - 1 -
603+
// j)` `s` being the "slow" row index and `j` being the "fast" column
604+
// index
605+
for (std::size_t s = 0; s < num_samples; s++) {
606+
const std::size_t idx = drv();
607+
for (std::size_t j = 0; j < n_wires; j++) {
608+
samples[s * n_wires + (n_wires - 1 - j)] = (idx >> j) & 1U;
653609
}
654610
}
655611
return samples;

0 commit comments

Comments
 (0)