Add merging of `Interner`s #13752

jakelishman · 2025-01-28T16:42:48Z

Summary

This adds ways of merging two interners together, including ways that re-use temporary allocations. The intention here is to make it much cheaper to rebuild DAGs and circuits from many small parts (like synthesis passes, basis-translation, etc), and to avoid many small-scale allocations when doing qarg remapping. The small-scale allocations are comparatively costly of themselves, and can often have amplified effects in threaded contexts (though it depends on the system allocator in use).

There's quite a few places that this could be used, such as UnitarySynthesis and BasisTranslator, but both of those will be substantial amounts of work to convert, and more appropriate for separate patches. Given the performance improvements in this PR, I suspect that they should both see decent improvements too.

Details and comments

Commit messages:

Add merging of `Interner` instances

This makes it possible to merge one interner into another, using a map-and-filter function to explain how the values should be considered afterwards. The main use-case for this is in DAG and circuit substitution methods, where we want to be able to map PackedInstructions from a small circuit to new ones on a larger circuit, without having to unwrap and re-hash the interned slices once for instruction; instead, we do that once per interner key, and then have a no-hash direct lookup table to remap the interner keys directly.

The methods here are designed to directly allow the reuse of the heap allocations used to track the interner key mappings, to help reduce the number of small-scale allocations done in substitution methods.

The filtering part of the mapping is because one cannot typically be certain that an Interner contains the minimally necessary keys, especially for DAGs; it can well be that substitutions and bit removals cause old data to be present.

Use new `Interner` merging in `DAGCircuit::from_circuit`

This converts one obvious candidate for the new interner-merging functionality. There are several others in the codebase (all uses of DAGCircuit::extend, for example, and several other full rebuilds), but the structure of their code wasn't designed with this in mind, so the modifications would typically be far more involved and more suitable for separate patches.

Using a timing script:

import itertools
from qiskit.circuit import QuantumCircuit
from qiskit.converters import circuit_to_dag

qc = QuantumCircuit(100, 100)
for pair in itertools.permutations(qc.qubits, 2):
    qc.rz(0, pair[0])
    qc.sx(pair[0])
    qc.rz(0, pair[0])
    qc.rz(0, pair[1])
    qc.sx(pair[1])
    qc.rz(0, pair[1])
    qc.cx(*pair)
    
%timeit circuit_to_dag(qc, copy_operations=False)

this patch showed an improvement from 18.5(6)ms to 14.4(5)ms on my machine, which is a 22% speedup, though admittedly this particular function was doing larger-scale allocation work that could be removed than other candidate replacements are.

This makes it possible to merge one interner into another, using a map-and-filter function to explain how the values should be considered afterwards. The main use-case for this is in DAG and circuit substitution methods, where we want to be able to map `PackedInstruction`s from a small circuit to new ones on a larger circuit, without having to unwrap and re-hash the interned slices once for instruction; instead, we do that once _per interner key_, and then have a no-hash direct lookup table to remap the interner keys directly. The methods here are designed to directly allow the reuse of the heap allocations used to track the interner key mappings, to help reduce the number of small-scale allocations done in substitution methods. The filtering part of the mapping is because one cannot typically be certain that an `Interner` contains the minimally necessary keys, especially for DAGs; it can well be that substitutions and bit removals cause old data to be present.

This converts one obvious candidate for the new interner-merging functionality. There are several others in the codebase (all uses of `DAGCircuit::extend`, for example, and several other full rebuilds), but the structure of their code wasn't designed with this in mind, so the modifications would typically be far more involved and more suitable for separate patches. Using a timing script: ```python import itertools from qiskit.circuit import QuantumCircuit from qiskit.converters import circuit_to_dag qc = QuantumCircuit(100, 100) for pair in itertools.permutations(qc.qubits, 2): qc.rz(0, pair[0]) qc.sx(pair[0]) qc.rz(0, pair[0]) qc.rz(0, pair[1]) qc.sx(pair[1]) qc.rz(0, pair[1]) qc.cx(*pair) %timeit circuit_to_dag(qc, copy_operations=False) ``` this patch showed an improvement from 18.5(6)ms to 14.4(5)ms on my machine, which is a 22% speedup, though admittedly this particular function was doing larger-scale allocation work that could be removed than other candidate replacements are.

qiskit-bot · 2025-01-28T16:42:52Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

coveralls · 2025-01-28T17:09:31Z

Pull Request Test Coverage Report for Build 13041462412

Details

123 of 253 (48.62%) changed or added relevant lines in 2 files are covered.
5 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.1%) to 88.815%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/circuit/src/interner.rs	54	184	29.35%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/lex.rs	1	92.98%
crates/accelerate/src/two_qubit_decompose.rs	1	92.08%
crates/circuit/src/circuit_data.rs	3	94.94%

Totals
Change from base Build 13010967771:	-0.1%
Covered Lines:	79759
Relevant Lines:	89804

💛 - Coveralls

kevinhartman

This one definitely seems like a slam dunk to me 😄.

Most of my comments are just for clarification.

The FromIterator implementation for Interner<T> is interesting. If it were implemented for T instead, would there really be any downside to that? It seems like implementing for ::Owned would mean equivalent values would still need to be duplicated by the caller.

crates/circuit/src/interner.rs

kevinhartman · 2025-01-28T17:29:57Z

crates/circuit/src/interner.rs

+                work.reserve(slice.len());
+                for value in slice {
+                    let Some(scalar) = scalar_map_fn(value) else {
+                        break 'slice None;


It's a feature to be careful of (feels like it could easily get spaghetti-ish), but when it works, it can really tidy up code. I'd originally tried to write this as a closure in order to use return in the same way (via ?), but it ended up really gross and I swapped to this.

kevinhartman · 2025-01-28T17:47:12Z

crates/circuit/src/interner.rs

+        // to allocate.  This method is lower leveel, so in this case we ask them to tell us; if
+        // they're optimising to the point of re-using the return allocations, they probably have a
+        // good idea about the maximum slice size of the interner they'll be merging in.
+        let mut work = SmallVec::<[T; N]>::with_capacity(N);


This is very neat. From what I can tell, we only do a heap allocation if every scalar maps and the slice doesn't already exist? And the latter is possible because of Equivalent working for Vec<T> and &[T].

We do heap allocations in two situations:

we have to insert into the interner (the situation you described) - this work thing here is a tacit assumption that that happens relatively rarely, though the cost of being wrong about that is very low in practice - it's a small memcpy.

we need more space for a slice than we allocated on the stack (so the SmallVec spills to the heap)

The Vec and the slice are Equivalent, but in this case, they also support PartialEq between borrows of each other, so it'd work in the default hashmaps without Equivalent too, I think.

kevinhartman · 2025-01-28T17:50:00Z

crates/circuit/src/interner.rs

+        // We're specifying the stack space here.  This is just a guess, but it's not hugely
+        // important; we'll safely spill from the stack to the heap if needed, and this function is
+        // an API convenience at the cost of optimal allocation performance anyway.
+        self.merge_map_slice_using::<4>(other, scalar_map_fn, &mut out);


Since this is stack space, are you choosing a low count like 4 for cache locality concerns?

Yeah, sort of, though I didn't really put much thought into the exact value - it's unlikely to matter in this method, because of the cost of allocating the map anyway.

kevinhartman · 2025-01-28T18:19:39Z

crates/circuit/src/dag_circuit.rs

-            Some(ordered_vec)
+            // The `Vec::get` use is because an arbitrary interner might contain old references to
+            // bit instances beyond `num_qubits`, such as if it's from a DAG that had wires removed.
+            new_dag.merge_qargs(qc_data.qargs_interner(), |bit| {


At this point, the new_dag does not yet contain any entries (other than the default), unless I'm mistaken. So it's less of a "merge" here and more of an efficient clone that skips over now-dead keys in the source interner, correct?

Separately, I just noticed that DAGCircuit::with_capacity preallocates its interners to have num_qubits and num_clbits entries. I'm not sure if that was an intentional choice, or due to a misread of the docstring when DAGCircuit::with_capacity was added.

Yeah exactly, this particular merge isn't so much a merge as an efficient mapping. In other situations we might want it (like substitute_node_with_dag), it's much more of an actual merge-with-map.

Co-authored-by: Kevin Hartman <[email protected]>

jakelishman · 2025-01-28T19:43:36Z

The FromIterator implementation for Interner<T> is interesting. If it were implemented for T instead, would there really be any downside to that? It seems like implementing for ::Owned would mean equivalent values would still need to be duplicated by the caller.

In general (and in almost all the cases we care about), T isn't Sized (it's usually [T]), so you can't hold direct values of it - you've got to indirect somehow. So then we have to choose either the borrowed type (e.g. &[T]) or the owned (e.g. Vec<T>), or something that can Into<<T as ToOwned>::Owned>. The FromIterator impl takes ownership of the values coming in, so we don't need to clone the values in the way I wrote it (we just move them into the IndexSet), but the caller of FromIterator might in order to produce the iterator.

In practice, I mostly wrote that impl to make the tests shorter to write, and mostly I thought that the chances were that if you wanted to use that method, you'd have the owned types anyway.

raynelfss

This is a very neat idea that should hopefully make conversion between circuits much easier. I will try to do a more in depth review of this tomorrow. But what I've seen so far looks great!

raynelfss

After a quick passthrough of the code, this looks like a very useful addition that I can appreciate. Other than that, the additions to the Interner methods are fairly welcome as these were issues I was running into myself, and had to, in come cases, add them too. I like everything being done here. LGTM!

I left a couple of questions and comments, would suggest that we also wait for @kevinhartman's in depth review.

crates/circuit/src/interner.rs

crates/circuit/src/dag_circuit.rs

kevinhartman · 2025-01-29T21:23:54Z

would suggest that we also wait for @kevinhartman's in depth review

@raynelfss I've already reviewed to a suitable depth. Feel free to sign off when your comments are addressed 🙂.

Co-authored-by: Raynel Sanchez <[email protected]>

jakelishman · 2025-01-29T22:17:39Z

Hate to be doing this but...

you'll be here forever if you check the spelling in my code comments haha - I try and write public documentation with American spelling, but I write comments straight from my head.

(I don't mind changing it to American, I just won't spot it myself easily.)

raynelfss

LGTM! Thank you for adding these changes and making the Interner much more flexible. It will be a helpful addition to #13686 and others.

- Remove stale methods `contains` and `get_interned` from `Interner` in favor of the changes introduced by Qiskit#13752.

* Add merging of `Interner` instances This makes it possible to merge one interner into another, using a map-and-filter function to explain how the values should be considered afterwards. The main use-case for this is in DAG and circuit substitution methods, where we want to be able to map `PackedInstruction`s from a small circuit to new ones on a larger circuit, without having to unwrap and re-hash the interned slices once for instruction; instead, we do that once _per interner key_, and then have a no-hash direct lookup table to remap the interner keys directly. The methods here are designed to directly allow the reuse of the heap allocations used to track the interner key mappings, to help reduce the number of small-scale allocations done in substitution methods. The filtering part of the mapping is because one cannot typically be certain that an `Interner` contains the minimally necessary keys, especially for DAGs; it can well be that substitutions and bit removals cause old data to be present. * Use new `Interner` merging in `DAGCircuit::from_circuit` This converts one obvious candidate for the new interner-merging functionality. There are several others in the codebase (all uses of `DAGCircuit::extend`, for example, and several other full rebuilds), but the structure of their code wasn't designed with this in mind, so the modifications would typically be far more involved and more suitable for separate patches. Using a timing script: ```python import itertools from qiskit.circuit import QuantumCircuit from qiskit.converters import circuit_to_dag qc = QuantumCircuit(100, 100) for pair in itertools.permutations(qc.qubits, 2): qc.rz(0, pair[0]) qc.sx(pair[0]) qc.rz(0, pair[0]) qc.rz(0, pair[1]) qc.sx(pair[1]) qc.rz(0, pair[1]) qc.cx(*pair) %timeit circuit_to_dag(qc, copy_operations=False) ``` this patch showed an improvement from 18.5(6)ms to 14.4(5)ms on my machine, which is a 22% speedup, though admittedly this particular function was doing larger-scale allocation work that could be removed than other candidate replacements are. * Fix typo Co-authored-by: Kevin Hartman <[email protected]> * Fix typos Co-authored-by: Raynel Sanchez <[email protected]> --------- Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: Raynel Sanchez <[email protected]>

jakelishman added 2 commits January 28, 2025 16:23

jakelishman added performance Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository labels Jan 28, 2025

jakelishman requested a review from a team as a code owner January 28, 2025 16:42

jakelishman changed the title ~~Interner merge~~ Add merging of Interners Jan 28, 2025

kevinhartman reviewed Jan 28, 2025

View reviewed changes

Fix typo

9308ce7

Co-authored-by: Kevin Hartman <[email protected]>

raynelfss reviewed Jan 28, 2025

View reviewed changes

raynelfss reviewed Jan 29, 2025

View reviewed changes

Fix typos

64ae502

Co-authored-by: Raynel Sanchez <[email protected]>

raynelfss approved these changes Jan 30, 2025

View reviewed changes

raynelfss added this pull request to the merge queue Jan 30, 2025

Merged via the queue into Qiskit:main with commit 315d6ad Jan 30, 2025
17 checks passed

jakelishman deleted the interner-merge branch January 30, 2025 15:44

raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jan 30, 2025

Fix: Remove changes to the Interner

f1ffc16

- Remove stale methods `contains` and `get_interned` from `Interner` in favor of the changes introduced by Qiskit#13752.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add merging of `Interner`s #13752

Add merging of `Interner`s #13752

jakelishman commented Jan 28, 2025 •

edited

Loading

qiskit-bot commented Jan 28, 2025

coveralls commented Jan 28, 2025 •

edited

Loading

kevinhartman left a comment

kevinhartman Jan 28, 2025

jakelishman Jan 28, 2025

kevinhartman Jan 28, 2025

jakelishman Jan 28, 2025

kevinhartman Jan 28, 2025

jakelishman Jan 28, 2025

kevinhartman Jan 28, 2025

jakelishman Jan 28, 2025

jakelishman commented Jan 28, 2025

raynelfss left a comment

raynelfss left a comment

kevinhartman commented Jan 29, 2025 •

edited

Loading

jakelishman commented Jan 29, 2025

raynelfss left a comment

Add merging of Interners #13752

Add merging of Interners #13752

Conversation

jakelishman commented Jan 28, 2025 • edited Loading

Summary

Details and comments

Add merging of Interner instances

Use new Interner merging in DAGCircuit::from_circuit

qiskit-bot commented Jan 28, 2025

coveralls commented Jan 28, 2025 • edited Loading

Pull Request Test Coverage Report for Build 13041462412

Details

💛 - Coveralls

kevinhartman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakelishman commented Jan 28, 2025

raynelfss left a comment

Choose a reason for hiding this comment

raynelfss left a comment

Choose a reason for hiding this comment

kevinhartman commented Jan 29, 2025 • edited Loading

jakelishman commented Jan 29, 2025

raynelfss left a comment

Choose a reason for hiding this comment

Add merging of `Interner`s #13752

Add merging of `Interner`s #13752

jakelishman commented Jan 28, 2025 •

edited

Loading

Add merging of `Interner` instances

Use new `Interner` merging in `DAGCircuit::from_circuit`

coveralls commented Jan 28, 2025 •

edited

Loading

kevinhartman commented Jan 29, 2025 •

edited

Loading