-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Circuit Primitives #445
Conversation
This commit speeds up the implementation of some core circuit primitives used in pyGSTi. The most notable are the circuit inequality check and hashing. 1. Equality checking is sped up by introducing a relatively inexpensive length comparison which allows fast short circuiting of the test. 2. For hashing we now leverage staticness more directly by caching both the hash, and the tuple representation used for hashing. To go along with these changes are (pre-emptive) bugfixes for some inplace circuit modification methods which previously did not check for whether a circuit was static (which would result in hash invalidation). Similarly we have added restrictions to the setter methods for some circuit parameters that feed into the hash to prevent modification of these for static circuits.
Remove duplicated docstring on class definition, leaving the attribute and class description alone.
This refactors the value stored in the private attribute _compilable_layer_indices_tup to remove the leading '__CMPLBL__' string value. This was only used in the construction of the python tuple representation of the circuit when calling the `tup` property. Elsewhere the inclusion required awkwardly having to check the length and then slicing into the tuple to skip this string when accessing the actual indices. As such, the inclusion of this string has been added to the code for `tup`, and the remaining code has been cleaned up to match. In addition to being cleaner, this should be more performant (as we need to do fewer checks when working with the value of this object now). Additionally, this commit also switches to directly inlining access to the _compilable_layer_indices_tup instead of going through the property in a few places to reduce method call overhead.
Inline access to _line_labels, _occurence_id where appropriate to reduce method call overhead. Additionally add a codepath for static circuit comparisons to __lt__ and __gt__.
This reimplements the circuit copy method using a faster code path inspired by the _fastinit code path used for quickly creating new circuits elsewhere in module. This adds a specialized _copy_init method, takes advantage of caching pre-computed values in static circuits, and generally inlining and reducing function call and assignment overheads across the board. Initial testing indicated 20-60x speed ups over original copy method.
This is out of place as a circuit method now, and should be removed.
The expected format for the labels when a circuit is editable is a nested list of lists of simple labels. When making an editable copy of a static circuit the new version of copy was not putting the labels in this format as needed. We now cache the previously editable version of the circuit layers at the point where the circuit is made static, and use this for speeding up the conversion back to an editable format.
Correctly handle empty compilable_layer_indices_tup case.
I am going to be almost immediately reverting this change, but I want this version saved in the git history for future reference.
This reverts commit 2b8e375.
This reverts commit 510e6f7.
Refactors the `is_simple` method for Labels into a class attribute.
This commit adds the following: A new helper function for speeding up copying by avoiding double circuit tuple construction. A new method for sandwiching a circuit with two labels (faster than is using the add operation twice). A bugfix for add which checks for the need to add to the line_labels of a circuit when the added label has a new one not presently in the circuit. Corrections and inlining of editable label to static label conversions. A slightly faster reworking of expand_subcircuits_inplace.
…cuits Bulk implementation of split_circuits and new option for complete_circuits that returns the split form of the circuit as an optional return value (which can speed things up in layout creation when both are needed).
Update the implementation of the expand_subcircuits_inplace method to recursively expand nested subcircuits.
This updates the implementation of the SeparatePOVMCircuit containter class. The most important change is adding an attribute for the full_effect_labels that avoids uneeded reconstruction. To add protection then, to ensure that this is kept in sync with everything else, the povm_label and effect_labels attributes (which feed into full_effect_labels) have been promoted to properties with setters that ensure the full_effect_labels are kept synced.
Fixes a bug in the updated subcircuit expansion function that didn't account for circuit repetitions in calculating depths.
Update the name of the circuit _fastinit kwarg used in stdinput.
Adds new unit tests for new circuit functionality, and fills in a couple gaps for previously untested methods/conditions. Removes skipped test for a long gone method.
Minor unit test updates to fix errors in test_packages. Also fixes some wonky formatting left over from profiling.
Not sure why this didn't get caught on the circuit update branch, but oh well...
This timer test occasionally fails due to sleep lasting slightly less time than the specified duration. I think this is fundamentally due simply to the relatively sloppy specs on sleep provided by OSes (most are apparently only accurate at the 1-10ms level, and while usually they run long, they can occasionally run short too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work as usual @coreyostrove!
It seems like the major parts of this PR are in the new initialization via _copy_init
, the tup
property, the copy()
and expand_subcircuits_inplace()
functions, as well as some of the dunder functions (__eq__
most notably). These look pretty good to me reading through them, but I think it'll be good to get this into develop and pay attention to any issues as we get the rest of 0.9.13 stuff in.
I've also mentioned some other smaller things that will hopefully make it more readable or changes I wasn't sure were necessary.
This incorporates some feedback from Stefan's review. Refactors the split_circuit and complete_circuit methods to use the multi-circuit variants under the hood. Fixes some bugs that were discovered in testing as part of making that change. Reduce duplication in the tup method and make some efficiency improvements.
Thanks for the feedback, @sserita! I have responded to a number of your questions/comments above. I've also just pushed a commit incorporating some of your suggested changes. |
Fully remove the old methods and switch to using the class attribute.
Fixes minor error in split_circuits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a million comments (and by a million, I mean 31).
Practically all of my comments fell into one of two categories.
- I'm uneasy about having
is_simple
be an exposed static variable for the various Label classes. I'd like to know what alternatives there might be. - There's lots of code that uses list comprehensions when generators would suffice and be faster. I've opened an issue about this so we know to investigate more broadly (Audit code for use of list comprehensions where generators would be faster #450). I suspect that an automated fix of this with find-and-replace might not be possible, so I think we should address these inefficiencies when we see them.
pygsti/baseobjs/label.py
Outdated
@@ -201,6 +198,10 @@ class LabelTup(Label, tuple): | |||
acted upon by the object this label refers to. | |||
""" | |||
|
|||
#flag used in certain Circuit subroutines | |||
#Whether this is a "simple" (opaque w/a true name, from a circuit perspective) label or not. | |||
is_simple= True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing white space around equals sign.
@coreyostrove I've gone through all my PR comments and (tried to) close the ones that were just about generators vs list comprehensions. There are still some meaningful comments left, so here's direct links to them:
|
This removes unneeded list construction from a few sorted calls. Also refactors filter calls into list comprehensions and refactors a few instances of chain (which can be kind of slow).
Minor update to assign to _line_labels directly.
Thanks for the feedback, @sserita and @rileyjmurray. I have addressed a number of your comments, so I have gone through and marked those as such to make it easier to see which conversation threads still need following up on. Right now it looks like the main open discussions remaining are:
Let me know if there is anything I have missed, or if there is something I have marked as resolved which you think still needs working on. |
@coreyostrove My current stance on your short list:
|
Update the convention used for the Label class attribute, and add a property for accessing this attribute externally.
Awesome, I just pushed the changes addressing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work as usual @coreyostrove!
I'm approving, but this is again going to wait on merge until we clear beta tests up...
These updates all look great to me. |
Beta tests clear, merging! |
I have always been slightly perturbed by the amount of time that things such as COPA layout creation take when performing GST analysis. There are a number of reasons for this, but one family of reasons is that layout generation relies heavily on a number of Circuit primitives that are not as performant as they can be. This PR is a step toward rectifying this through re-implementation of a number of methods identified as bottlenecks through profiling in the context of actual end-to-end GST analysis.
In terms of the top-level numbers, for 1-qubit GST fitting using the 'full TP' parameterization with an experiment design containing 1120 circuits (max depth 128), the changes on this branch collectively reduce the time required for layout creation by ~50%, and reduce the overall runtime of the end-to-end protocol by ~15%. Your mileage will of course vary, however, and I would anticipate higher relative savings for larger/deeper experiment designs. I have not done profiling for the two-qubit case yet, so I won't posit any guesses on the impact there (I would guess it to be less relatively speaking, but can't easily ballpark this without actually testing).
Summary of Changes:
Here is a high-level summary of the changes. I will also be leaving some comments on particular bits of code to explain any more detailed reasoning for some changes (when in doubt the answer is likely that I tried a few different things and picked the fastest way to do something).
Note on synthetic benchmarks: For the quoted figures below these were mostly done using a randomly generated set of test circuits generated according to the following code snippet:
For equality testing this was run against
random_pairs
, for inclusion testing againstrandom_circuits
andrandom_circuits_2
, and for other tests againstrandom_circuits
unless stated otherwise.tup
property was called whenever this was needed and this property call regenenerated the tuple representation from scratch each time, which is expensive. In synthetic tests this resulted in a ~4.5X speedup, but in the context of the GST profiling described above this was closer to 10X (the computational savings should scale with circuit depth, so this isn't too shocking of a difference). As part of this change I have added static assertions to a few methods previously missing them for safety (we don't want to inadvertently invalidate the cached hash).__init__
pathway to create a copied circuit instance. With the new implementation we utilize a much faster specialized codepath inspired by the previously existing_fastinit
one. But with a few more microptimizations to take advantage of some of the additional promises we can rely on in the context of copying (which we can't as easily in_fastinit
). Speedups here range from 3-15X depending on the input and output circuit types. With the low end corresponding to switching static to editable (or vice versa), and the high end keeping the input type unchanged.This is the end of the changes where I profiled these using synthetic tests. For the rest I did this in the context of GST optimization using cProfile/snakeviz, but didn't carefully record before/after relative changes.
Microoptimization galore: There are a bunch of microoptimizations of the sort that usually don't matter except for when code is particularly hot, and the circuit code is. I won't give an exhaustive list, but this includes things like:
tuple([i for i in iterable])
is measurably faster thantuple(i for i in iterable)
in practice.if len(obj)>0
idiom if all you're doing is checking if an iterable is empty.Rework SeparatePOVMCircuit internals: The
full_effect_labels
property was getting called a lot, and each time it was it was regenerating the result from scratch (with some non-trivial preprocessing). I've updated the internals to cache this, and also deal with updating the cached value when necessary. This gave a measurable speedup.New
sandwich
method for circuits. This takes as input two tuples of Labels which are prepended and appended to a circuit. This is around twice as fast as doing this with a pair of calls to__add__
(since we only need 1 call to_fastinit
). This is not an uncommon task (I added as part of some changes to the circuit completion code in OpModel), so if you spot any places where this is being done it'd be worth switching (maybe @rileyjmurray can write a fancy regex to find these instances).Additional changes (either outside of the Circuit class, or otherwise not directly performance focused):
complete_circuit
andsplit_circuit
: There are now versions of thecomplete_circuit
andsplit_circuit
methods for the OpModel class, which respectively add prep and povm labels to a circuit or split them off, which operate on a complete list of circuits all at once. This allows for avoiding some model specific lookups (for values which don't change circuit-to-circuit) many times and is faster overall.expand_subcircuits_inplace
: The implementation ofexpand_subcircuits_inplace
has been reworked so that the behavior is now to recursively expand CircuitLabel objects. Previously the expansion routine only took a single pass through the circuit, meaning that if the circuit contained any nested CircuitLabels (which are perfectly happy with being nested, an easy way to get something like this is to callto_label
on a circuit containing CircuitLabels) we would only end up expanding out the top level. I noticed there were a few places in the layout creation/forward simulation code that implicitly assumed only a single-level of CircuitLabels would be present, which would likely have eventually led to some really annoying bugs to track down. This change covers that edge case, and should only be slightly more expensive than the previous routine in the typical case where there is only a single-level of CircuitLabels.simulate
method. If there is strong opposition to removing this functionality we can discuss this, of course. My reasoning here is that given the rest of pygsti's structure this functionality really isn't something we should be asking a Circuit to do, as the preferred pattern nowadays is delegate this job to the Model/ForwardSimulator, rather than asking the circuit itself. I don't know how widespread the use of this is (my guess is that this is largely vestigial), but I'd recommend slating this for removal in 0.9.14. Related note, I don't know if we have a running list of things we've deprecated and their expiry date, but if not we should assemble one.Finally, there are a number of miscellaneous documentation fixes, bugfixes, and updates to unit tests.
P.S. There are some more layout creation performance enhancements inbound on a separate branch, but I've opted to keep these changes together in a separate PR to keep things thematic and make it easier to review without context switching.