-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add merging of Interner
s
#13752
Add merging of Interner
s
#13752
Conversation
This makes it possible to merge one interner into another, using a map-and-filter function to explain how the values should be considered afterwards. The main use-case for this is in DAG and circuit substitution methods, where we want to be able to map `PackedInstruction`s from a small circuit to new ones on a larger circuit, without having to unwrap and re-hash the interned slices once for instruction; instead, we do that once _per interner key_, and then have a no-hash direct lookup table to remap the interner keys directly. The methods here are designed to directly allow the reuse of the heap allocations used to track the interner key mappings, to help reduce the number of small-scale allocations done in substitution methods. The filtering part of the mapping is because one cannot typically be certain that an `Interner` contains the minimally necessary keys, especially for DAGs; it can well be that substitutions and bit removals cause old data to be present.
This converts one obvious candidate for the new interner-merging functionality. There are several others in the codebase (all uses of `DAGCircuit::extend`, for example, and several other full rebuilds), but the structure of their code wasn't designed with this in mind, so the modifications would typically be far more involved and more suitable for separate patches. Using a timing script: ```python import itertools from qiskit.circuit import QuantumCircuit from qiskit.converters import circuit_to_dag qc = QuantumCircuit(100, 100) for pair in itertools.permutations(qc.qubits, 2): qc.rz(0, pair[0]) qc.sx(pair[0]) qc.rz(0, pair[0]) qc.rz(0, pair[1]) qc.sx(pair[1]) qc.rz(0, pair[1]) qc.cx(*pair) %timeit circuit_to_dag(qc, copy_operations=False) ``` this patch showed an improvement from 18.5(6)ms to 14.4(5)ms on my machine, which is a 22% speedup, though admittedly this particular function was doing larger-scale allocation work that could be removed than other candidate replacements are.
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 13041462412Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one definitely seems like a slam dunk to me 😄.
Most of my comments are just for clarification.
The FromIterator
implementation for Interner<T>
is interesting. If it were implemented for T
instead, would there really be any downside to that? It seems like implementing for ::Owned
would mean equivalent values would still need to be duplicated by the caller.
work.reserve(slice.len()); | ||
for value in slice { | ||
let Some(scalar) = scalar_map_fn(value) else { | ||
break 'slice None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a feature to be careful of (feels like it could easily get spaghetti-ish), but when it works, it can really tidy up code. I'd originally tried to write this as a closure in order to use return
in the same way (via ?
), but it ended up really gross and I swapped to this.
// to allocate. This method is lower leveel, so in this case we ask them to tell us; if | ||
// they're optimising to the point of re-using the return allocations, they probably have a | ||
// good idea about the maximum slice size of the interner they'll be merging in. | ||
let mut work = SmallVec::<[T; N]>::with_capacity(N); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very neat. From what I can tell, we only do a heap allocation if every scalar maps and the slice doesn't already exist? And the latter is possible because of Equivalent
working for Vec<T>
and &[T]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do heap allocations in two situations:
- we have to insert into the interner (the situation you described) - this
work
thing here is a tacit assumption that that happens relatively rarely, though the cost of being wrong about that is very low in practice - it's a small memcpy. - we need more space for a slice than we allocated on the stack (so the
SmallVec
spills to the heap)
The Vec
and the slice are Equivalent
, but in this case, they also support PartialEq
between borrows of each other, so it'd work in the default hashmaps without Equivalent
too, I think.
// We're specifying the stack space here. This is just a guess, but it's not hugely | ||
// important; we'll safely spill from the stack to the heap if needed, and this function is | ||
// an API convenience at the cost of optimal allocation performance anyway. | ||
self.merge_map_slice_using::<4>(other, scalar_map_fn, &mut out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is stack space, are you choosing a low count like 4
for cache locality concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sort of, though I didn't really put much thought into the exact value - it's unlikely to matter in this method, because of the cost of allocating the map anyway.
Some(ordered_vec) | ||
// The `Vec::get` use is because an arbitrary interner might contain old references to | ||
// bit instances beyond `num_qubits`, such as if it's from a DAG that had wires removed. | ||
new_dag.merge_qargs(qc_data.qargs_interner(), |bit| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, the new_dag
does not yet contain any entries (other than the default), unless I'm mistaken. So it's less of a "merge" here and more of an efficient clone that skips over now-dead keys in the source interner, correct?
Separately, I just noticed that DAGCircuit::with_capacity
preallocates its interners to have num_qubits
and num_clbits
entries. I'm not sure if that was an intentional choice, or due to a misread of the docstring when DAGCircuit::with_capacity
was added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly, this particular merge isn't so much a merge as an efficient mapping. In other situations we might want it (like substitute_node_with_dag
), it's much more of an actual merge-with-map.
Co-authored-by: Kevin Hartman <[email protected]>
In general (and in almost all the cases we care about), In practice, I mostly wrote that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very neat idea that should hopefully make conversion between circuits much easier. I will try to do a more in depth review of this tomorrow. But what I've seen so far looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a quick passthrough of the code, this looks like a very useful addition that I can appreciate. Other than that, the additions to the Interner methods are fairly welcome as these were issues I was running into myself, and had to, in come cases, add them too. I like everything being done here. LGTM!
I left a couple of questions and comments, would suggest that we also wait for @kevinhartman's in depth review.
@raynelfss I've already reviewed to a suitable depth. Feel free to sign off when your comments are addressed 🙂. |
Co-authored-by: Raynel Sanchez <[email protected]>
you'll be here forever if you check the spelling in my code comments haha - I try and write public documentation with American spelling, but I write comments straight from my head. (I don't mind changing it to American, I just won't spot it myself easily.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for adding these changes and making the Interner much more flexible. It will be a helpful addition to #13686 and others.
- Remove stale methods `contains` and `get_interned` from `Interner` in favor of the changes introduced by Qiskit#13752.
* Add merging of `Interner` instances This makes it possible to merge one interner into another, using a map-and-filter function to explain how the values should be considered afterwards. The main use-case for this is in DAG and circuit substitution methods, where we want to be able to map `PackedInstruction`s from a small circuit to new ones on a larger circuit, without having to unwrap and re-hash the interned slices once for instruction; instead, we do that once _per interner key_, and then have a no-hash direct lookup table to remap the interner keys directly. The methods here are designed to directly allow the reuse of the heap allocations used to track the interner key mappings, to help reduce the number of small-scale allocations done in substitution methods. The filtering part of the mapping is because one cannot typically be certain that an `Interner` contains the minimally necessary keys, especially for DAGs; it can well be that substitutions and bit removals cause old data to be present. * Use new `Interner` merging in `DAGCircuit::from_circuit` This converts one obvious candidate for the new interner-merging functionality. There are several others in the codebase (all uses of `DAGCircuit::extend`, for example, and several other full rebuilds), but the structure of their code wasn't designed with this in mind, so the modifications would typically be far more involved and more suitable for separate patches. Using a timing script: ```python import itertools from qiskit.circuit import QuantumCircuit from qiskit.converters import circuit_to_dag qc = QuantumCircuit(100, 100) for pair in itertools.permutations(qc.qubits, 2): qc.rz(0, pair[0]) qc.sx(pair[0]) qc.rz(0, pair[0]) qc.rz(0, pair[1]) qc.sx(pair[1]) qc.rz(0, pair[1]) qc.cx(*pair) %timeit circuit_to_dag(qc, copy_operations=False) ``` this patch showed an improvement from 18.5(6)ms to 14.4(5)ms on my machine, which is a 22% speedup, though admittedly this particular function was doing larger-scale allocation work that could be removed than other candidate replacements are. * Fix typo Co-authored-by: Kevin Hartman <[email protected]> * Fix typos Co-authored-by: Raynel Sanchez <[email protected]> --------- Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: Raynel Sanchez <[email protected]>
Summary
This adds ways of merging two interners together, including ways that re-use temporary allocations. The intention here is to make it much cheaper to rebuild DAGs and circuits from many small parts (like synthesis passes, basis-translation, etc), and to avoid many small-scale allocations when doing qarg remapping. The small-scale allocations are comparatively costly of themselves, and can often have amplified effects in threaded contexts (though it depends on the system allocator in use).
There's quite a few places that this could be used, such as
UnitarySynthesis
andBasisTranslator
, but both of those will be substantial amounts of work to convert, and more appropriate for separate patches. Given the performance improvements in this PR, I suspect that they should both see decent improvements too.Details and comments
Commit messages:
Add merging of
Interner
instancesThis makes it possible to merge one interner into another, using a map-and-filter function to explain how the values should be considered afterwards. The main use-case for this is in DAG and circuit substitution methods, where we want to be able to map
PackedInstruction
s from a small circuit to new ones on a larger circuit, without having to unwrap and re-hash the interned slices once for instruction; instead, we do that once per interner key, and then have a no-hash direct lookup table to remap the interner keys directly.The methods here are designed to directly allow the reuse of the heap allocations used to track the interner key mappings, to help reduce the number of small-scale allocations done in substitution methods.
The filtering part of the mapping is because one cannot typically be certain that an
Interner
contains the minimally necessary keys, especially for DAGs; it can well be that substitutions and bit removals cause old data to be present.Use new
Interner
merging inDAGCircuit::from_circuit
This converts one obvious candidate for the new interner-merging functionality. There are several others in the codebase (all uses of
DAGCircuit::extend
, for example, and several other full rebuilds), but the structure of their code wasn't designed with this in mind, so the modifications would typically be far more involved and more suitable for separate patches.Using a timing script:
this patch showed an improvement from 18.5(6)ms to 14.4(5)ms on my machine, which is a 22% speedup, though admittedly this particular function was doing larger-scale allocation work that could be removed than other candidate replacements are.