Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not independent edges in matching #104

Closed
gbtami opened this issue Oct 13, 2024 · 7 comments
Closed

Not independent edges in matching #104

gbtami opened this issue Oct 13, 2024 · 7 comments

Comments

@gbtami
Copy link

gbtami commented Oct 13, 2024

First of all I have to say I know nothing about decoding quantum error correcting (QEC) codes. So maybe I completely misuse this lib.
I tried to compare how performant PyMatching decode_to_edges_array() is compared to networkx min_weight_matching() and find sometimes it creates invalid matching edge sets.

from itertools import product
import random

import networkx as nx
import numpy as np
import pymatching

NODE_MAX = 10

# Create a complete graph with random weighted edges

g = nx.Graph()
print(g)
print("---")

nodes = range(NODE_MAX)

g.add_nodes_from(nodes)
print(g)
# print(g.nodes.data())
print("---")

for pair in product(nodes, nodes):
    if pair[0] != pair[1]:
        g.add_edge(*pair, weight=random.randint(1, NODE_MAX))

print(g)
print(g.edges.data())
print("---")


pairings = nx.min_weight_matching(g)
print("--- NX ---")
sum_weight = sum((g.edges[pair]["weight"] for pair in pairings))
for pair in pairings:
    print(pair, g.edges[pair])
print("len:%s weight_sum:%s" % (len(pairings), sum_weight))


# m = pymatching.Matching.from_networkx(g)
m = pymatching.Matching()
for e in g.edges.data():
    m.add_edge(e[0], e[1], weight=e[2]["weight"])

pairings = m.decode_to_edges_array(np.ones(NODE_MAX))
sum_weight = sum((g.edges[pair]["weight"] for pair in pairings))
print("--- PYM ---")
for pair in pairings:
    print(pair, g.edges[pair])
print("len:%s weight_sum:%s" % (len(pairings), sum_weight))
tamas@tami:~/pychess-variants$ python3 server/arena_pairing.py 
Graph with 0 nodes and 0 edges
---
Graph with 10 nodes and 0 edges
---
Graph with 10 nodes and 45 edges
[(0, 1, {'weight': 8}), (0, 2, {'weight': 4}), (0, 3, {'weight': 1}), (0, 4, {'weight': 6}), (0, 5, {'weight': 4}), (0, 6, {'weight': 2}), (0, 7, {'weight': 7}), (0, 8, {'weight': 1}), (0, 9, {'weight': 1}), (1, 2, {'weight': 6}), (1, 3, {'weight': 5}), (1, 4, {'weight': 6}), (1, 5, {'weight': 7}), (1, 6, {'weight': 10}), (1, 7, {'weight': 4}), (1, 8, {'weight': 5}), (1, 9, {'weight': 8}), (2, 3, {'weight': 3}), (2, 4, {'weight': 6}), (2, 5, {'weight': 3}), (2, 6, {'weight': 4}), (2, 7, {'weight': 2}), (2, 8, {'weight': 3}), (2, 9, {'weight': 1}), (3, 4, {'weight': 6}), (3, 5, {'weight': 2}), (3, 6, {'weight': 10}), (3, 7, {'weight': 8}), (3, 8, {'weight': 7}), (3, 9, {'weight': 6}), (4, 5, {'weight': 2}), (4, 6, {'weight': 9}), (4, 7, {'weight': 8}), (4, 8, {'weight': 5}), (4, 9, {'weight': 3}), (5, 6, {'weight': 9}), (5, 7, {'weight': 9}), (5, 8, {'weight': 3}), (5, 9, {'weight': 2}), (6, 7, {'weight': 4}), (6, 8, {'weight': 4}), (6, 9, {'weight': 10}), (7, 8, {'weight': 9}), (7, 9, {'weight': 8}), (8, 9, {'weight': 1})]
---
--- NX ---
(7, 1) {'weight': 4}
(5, 4) {'weight': 2}
(0, 3) {'weight': 1}
(2, 6) {'weight': 4}
(9, 8) {'weight': 1}
len:5 weight_sum:12
--- PYM ---
[0 3] {'weight': 1}
[0 6] {'weight': 2}
[0 8] {'weight': 1}
[1 7] {'weight': 4}
[2 9] {'weight': 1}
[4 5] {'weight': 2}
len:6 weight_sum:11

I expected max 5 edges without any common nodes.
What am I misunderstanding here?

@oscarhiggott
Copy link
Owner

oscarhiggott commented Oct 14, 2024

Hi @gbtami thanks for raising this, it highlights a lack of documentation that would be useful for users interested in related graph theory problems outside of quantum error correction. In fact I've been meaning to add extra examples and documentation for this for a while (see issues #46 and #61) so I'll provide quite a long answer, which I'll modify and add into the documentation (without your specific example) soon if it is clear enough. I'll explain the problem pymatching solves in the language of graph theory instead of error correction and then explain how you can use pymatching to solve your minimum-weight perfect matching problem. Please let me know if anything you need is unclear. If you just want a fix to your problem you can copy the code at the bottom.

Minimum-weight T-joins

PyMatching doesn't actually solve exactly the minimum-weight perfect matching (MWPM) problem, but instead a closely related problem known as the minimum-weight T-join problem (MWTJ) in graph theory (which we call the minimum-weight embedded matching problem in our paper).

The MWTJ problem is defined as follows. Let $G=(V,E)$ be a graph and let $T\subseteq V$ be a set of nodes. We assign a weight $w(u,v)\in ℝ$ to each edge $(u,v)\in E$. The MWTJ problem finds a set of edges $M_{TJ}\subseteq E$ with minimum weight-sum $\sum_{(u,v)\in M_{TJ}}w(u,v)$ such that each node in $T$ is incident to an odd number of edges in $M_{TJ}$ and each node not in $T$ is incident to an even number of edges in $M_{TJ}$.

So the following function finds a minimum-weight T-join using pymatching:

def min_weight_T_join(matching_graph: pymatching.Matching, T: np.ndarray) -> np.ndarray:
    syndrome = np.zeros(matching_graph.num_detectors, dtype=np.uint8)
    syndrome[T.astype(np.integer)] = 1
    return matching_graph.decode_to_edges_array(syndrome)

There are two small differences here to the normal MWTJ problem. Firstly, pymatching also allows boundary edges (or "half edges") to be added to the graph. In contrast to an edge $(u,v)$ a boundary edges $(u,)$ is an edge with only one endpoint, instead of two. So if boundary edges are added to the pymatching.Matching graph then MWTJ problem is generalised in the obvious way to include them. The second difference is that pymatching uses integer edge weights with a maximum weight of $2^{24}-1=16,777,215$. If the edge weights you add to the matching graph are all integers less than $2^{24}$ then the solution provided by pymatching to the above generalised MWTJ problem will be exact (and therefore integral). If any edge weights are not integers, then pymatching will (in the C++ when solving, not in the Python API) rescale and discretise all the edge weights such that their absolute edge weight is an integer less than $2^{24}$. Finally, any edge weights larger than 2**24-1 are not added to the graph and a warning is raised.

Importantly, even if you set $T=V$ like you did in your example this is not equivalent to the MWPM problem, as the MWTJ only enforces that each node in $T$ is incident to an odd number of edges in the solution, whereas MWPM enforces that there is exactly one edge incident to each node. In your example there is a node that is incident to three edges in the solution, as this has lower weight than the MWPM. For a more simple visual example of this see Figure 1(b) of our paper.

Reduction from min-weight T-joins to min-weight perfect matchings

It turns out that if you have an efficient algorithm for the MWPM problem (i.e. the blossom algorithm) you can use it to find a MWTJ (first shown in this paper). Furthermore, if you have an efficient algorithm for the MWTJ problem (i.e. pymatching) then it can be used to find a MWPM (which is what you need).

For this reduction we will assume that the matching graph contains only edges, and no boundary edges. If this is the case, then any perfect matching will contain exactly $N/2$ edges for a graph with $N$ nodes. Since the number of edges is fixed for any perfect matching, we can add a constant $c$ to the weight of every edge in the graph without changing the solution to the MWPM problem. This modification increases the weight-sum of the solution by a constant $cN/2$.

On the other hand, the MWTJ problem can have more than $N/2$ edges in its solution $M_{TJ}$, since each node can be incident to more than one edge in $M_{TJ}$. If a MWTJ has exactly $N/2$ edges, and if $T=V$, then it equals the MWPM. So adding a constant $c$ to the weight of every edge penalises the addition of more edges to the MWTJ, and setting a sufficiently large $c$ is sufficient to solve the MWPM problem with a MWTJ solver. Concretely, if you set $c>\sum_{(u,v)\in E}|w(u,v)|$, then a T-join with $k$ edges will always have lower weight than a T-join with $k+1$ edges, leading to the reduction.

However, this approach adds a very large constant $c$ that scales with the size of the graph, which would either lead to a loss in precision for the edge weight discretisation mentioned above, or lead to an edge weight exceeding the maximum $2^{24}-1$. Instead, I suggest adding an increasingly large constant $c$ to the graph until the MWTJ contains the required $N/2$ edges. This is what I implement below, where I double $c$ after each retry, starting with $c$ set to one more than the largest absolute value of any edge weight. In practice at most one retry often seems to be sufficient (and at most a logarithmic number of retries are needed in the worst case).

Note that pymatching is optimised to be fast when solving the MWTJ problem for the types of problems that arise in quantum error correction: these are usually sparse graphs where the set $T$ is small relative to $V$, and where the set of edges $M_{TJ}$ in the solution is also small relative to $E$. We did not optimize for worst-case complexity, which is $O(|V|^4)$. Nevertheless, it may well be a very fast MWTJ solver (relative to simple implementations using a reduction to MWPM) for many more general problems in practice, but we have not benchmarked it on these more general cases. For the traditional MWPM problem, while it might be faster than NetworkX, it is unlikely that it will always be as fast as the Blossom V or Lemon libraries (it will have a slower worst-case runtime, even if it often performs similarly for some graphs).

Code for solving a MWPM problem using pymatching

Below is some code for solving your MWPM problem which gives the correct solution, consistent with NetworkX.

import networkx as nx
import numpy as np
import pymatching


def add_constant_to_all_edge_weights(
    matching_graph: pymatching.Matching, constant: float
) -> None:
    """Add a constant to the weights of all edges in the matching graph

    Parameters
    ----------
    matching_graph : pymatching.Matching
        A pymatching.Matching matching graph
    constant : float
        The constant to add to the weight of every edge in the graph, which
        may be negative.
    """
    for i, j, d in matching_graph.edges():
        if j is not None:
            matching_graph.add_edge(
                i,
                j,
                fault_ids=d["fault_ids"],
                weight=d["weight"] + constant,
                error_probability=d["error_probability"],
                merge_strategy="replace",
            )
        else:
            matching_graph.add_boundary_edge(
                i,
                fault_ids=d["fault_ids"],
                weight=d["weight"] + constant,
                error_probability=d["error_probability"],
                merge_strategy="replace",
            )


def matched_edges_array_has_unique_non_boundary_nodes(
    matched_edges: np.ndarray,
) -> bool:
    """Check if the array of edges returned by
    `pymatching.Matching.decode_to_edges_array` has at most one 
    edge incident to each (non-boundary) node in the matching graph.

     The MWPM decoder solves the minimum weight embedded matching
     problem (or minimum-weight T-join problem), which is slightly
     different from the normal MWPM problem, even if a detection
     event is placed on every node (i.e. an all-ones syndrome is
     given). The decoder finds the minimum-weight set of edges J such that
     each detection event is incident to an odd number of edges in J, and
     each node that is not a detection event is incident to an even number
     of edges in J. In contrast a MWPM finds a minimum weight set of edges
     such that each node is incident to a single edge.

     If `pymatching.Matching.decode_to_edges_array` is run on an 
     all-ones syndrome and the graph contains only edges (no boundary 
     edges) then this method determines if the solution is a MWPM.

    Parameters
    ----------
    matched_edges : np.ndarray
        A 2D corresponding to the output of the method
        `pymatching.Matching.decode_to_edges_array(syndrome)` where
        `syndrome=numpy.ones(pymatching.Matching.num_detectors, dtype=numpy.uint8)`

    Returns
    -------
    bool
        True if the solution has at most one edge incident to 
        each (non-boundary) node in the matching graph, else False
    """
    # Check if min-weight T-join is also a min-weight matching
    unique_nodes, node_counts = np.unique(matched_edges, return_counts=True)
    # Duplicated boundary nodes are fine
    node_counts[unique_nodes == -1] = 1
    # It's a min-weight perfect matching if no node appears more than once
    # in the solution
    return not np.any(node_counts > 1)


def min_weight_perfect_matching(matching_graph: pymatching.Matching) -> np.ndarray:
    """Find the minimum-weight perfect matching (MWPM) in the graph `matching_graph`.

    If the graph contains edges (and no boundary edges), this method will find a
    set of edges M such that each node in the graph is incident to exactly
    one edge in M, and such that the sum of the weights of the edges in M
    is minimised.

    This method is guaranteed to find a minimum-weight solution provided that the
    graph `matching_graph` contains only edges and no boundary edges, and provided
    that the graph contains at least one perfect matching.
    If the graph contains boundary edges, then the solution returned will still be
    a perfect matching (if one exists) but it is not guaranteed to be minimum-weight.

    An `ValueError` is raised if the graph does not contain a perfect matching.

    Parameters
    ----------
    matching_graph : pymatching.Matching
        A pymatching.Matching graph for which the MWPM should be found

    Returns
    -------
    np.ndarray
        A 2D array `edges` giving the edges in the matching solution as pairs of nodes (or as a
        node and the boundary, for a boundary edge). If there are `num_mwpm_edges` edges then the shape of
        `edges` is `edges.shape=(num_mwpm_edges, 2)`, and edge `i` is between node `edges[i, 0]`
        and node `edges[i, 1]`. For a boundary edge `i` between a node `k` and the boundary
        (either a boundary node or the virtual boundary node), then `pairs[i,0]` is `k`, and `pairs[i,1]=-1`
        denotes the boundary (the boundary is always denoted by -1 and is always in the second column).
    """
    syndrome = np.ones(matching_graph.num_detectors, dtype=np.uint8)
    matched_edges = matching_graph.decode_to_edges_array(syndrome=syndrome)

    is_mwpm = matched_edges_array_has_unique_non_boundary_nodes(matched_edges)

    if is_mwpm:
        return matched_edges
    else:
        weight_increase = (
            max(abs(d["weight"]) for _, _, d in matching_graph.edges()) + 1
        )
        added_weight = 0
        while not is_mwpm:
            add_constant_to_all_edge_weights(
                matching_graph=matching_graph, constant=weight_increase
            )
            added_weight += weight_increase
            matched_edges = matching_graph.decode_to_edges_array(syndrome)
            is_mwpm = matched_edges_array_has_unique_non_boundary_nodes(matched_edges)
            weight_increase *= 2

        add_constant_to_all_edge_weights(
            matching_graph=matching_graph, constant=-added_weight
        )
        return matched_edges


def min_weight_T_join(matching_graph: pymatching.Matching, T: np.ndarray) -> np.ndarray:
    """Find the minimum-weight T join in the matching graph

    A minimum-weight T-join is a subset of the edges M in `matching_graph` with
    minimum weight-sum such that each node in the set T is incident to an
    odd number of edges in M and each node not in T is incident to an even
    number of edges in M.

    Parameters
    ----------
    matching_graph : pymatching.Matching
        A pymatching matching graph
    T : np.ndarray
        A set of nodes in `matching_graph`. Should be an integer numpy array
        where each element is an index of a node in `matching_graph`.

    Returns
    -------
    np.ndarray
        The minimum weight T-join in `matching_graph` for the provided set `T`.
    """
    syndrome = np.zeros(matching_graph.num_detectors, dtype=np.uint8)
    syndrome[T.astype(np.integer)] = 1
    return matching_graph.decode_to_edges_array(syndrome)


if __name__ == "__main__":
    # Use example from issue #104

    NODE_MAX = 10

    example_edge_data = [
        (0, 1, {"weight": 8}),
        (0, 2, {"weight": 4}),
        (0, 3, {"weight": 1}),
        (0, 4, {"weight": 6}),
        (0, 5, {"weight": 4}),
        (0, 6, {"weight": 2}),
        (0, 7, {"weight": 7}),
        (0, 8, {"weight": 1}),
        (0, 9, {"weight": 1}),
        (1, 2, {"weight": 6}),
        (1, 3, {"weight": 5}),
        (1, 4, {"weight": 6}),
        (1, 5, {"weight": 7}),
        (1, 6, {"weight": 10}),
        (1, 7, {"weight": 4}),
        (1, 8, {"weight": 5}),
        (1, 9, {"weight": 8}),
        (2, 3, {"weight": 3}),
        (2, 4, {"weight": 6}),
        (2, 5, {"weight": 3}),
        (2, 6, {"weight": 4}),
        (2, 7, {"weight": 2}),
        (2, 8, {"weight": 3}),
        (2, 9, {"weight": 1}),
        (3, 4, {"weight": 6}),
        (3, 5, {"weight": 2}),
        (3, 6, {"weight": 10}),
        (3, 7, {"weight": 8}),
        (3, 8, {"weight": 7}),
        (3, 9, {"weight": 6}),
        (4, 5, {"weight": 2}),
        (4, 6, {"weight": 9}),
        (4, 7, {"weight": 8}),
        (4, 8, {"weight": 5}),
        (4, 9, {"weight": 3}),
        (5, 6, {"weight": 9}),
        (5, 7, {"weight": 9}),
        (5, 8, {"weight": 3}),
        (5, 9, {"weight": 2}),
        (6, 7, {"weight": 4}),
        (6, 8, {"weight": 4}),
        (6, 9, {"weight": 10}),
        (7, 8, {"weight": 9}),
        (7, 9, {"weight": 8}),
        (8, 9, {"weight": 1}),
    ]

    example_edge_data_dict = {(i, j): d for (i, j, d) in example_edge_data}

    # Create a complete graph with random weighted edges from the example
    g = nx.Graph()
    for i, j, d in example_edge_data:
        g.add_edge(i, j, weight=d["weight"])
    print(g)
    print(g.edges.data())
    print("---")

    # Solve using NetworkX
    pairings = nx.min_weight_matching(g)
    print("--- NX ---")
    sum_weight = sum((g.edges[pair]["weight"] for pair in pairings))
    for pair in pairings:
        print(pair, g.edges[pair])
    print("len:%s weight_sum:%s" % (len(pairings), sum_weight))

    # Solve using PyMatching
    m = pymatching.Matching.from_networkx(g)

    pairings = min_weight_perfect_matching(m)

    assert pairings.shape[0] == NODE_MAX // 2

    sum_weight = sum((g.edges[pair]["weight"] for pair in pairings))
    print("--- PYM ---")
    for pair in pairings:
        print(pair, g.edges[pair])
    print("len:%s weight_sum:%s" % (len(pairings), sum_weight))

and running this provides the correct solution to your problem:

$ python arena_pairing.py 
Graph with 10 nodes and 45 edges
[(0, 1, {'weight': 8}), (0, 2, {'weight': 4}), (0, 3, {'weight': 1}), (0, 4, {'weight': 6}), (0, 5, {'weight': 4}), (0, 6, {'weight': 2}), (0, 7, {'weight': 7}), (0, 8, {'weight': 1}), (0, 9, {'weight': 1}), (1, 2, {'weight': 6}), (1, 3, {'weight': 5}), (1, 4, {'weight': 6}), (1, 5, {'weight': 7}), (1, 6, {'weight': 10}), (1, 7, {'weight': 4}), (1, 8, {'weight': 5}), (1, 9, {'weight': 8}), (2, 3, {'weight': 3}), (2, 4, {'weight': 6}), (2, 5, {'weight': 3}), (2, 6, {'weight': 4}), (2, 7, {'weight': 2}), (2, 8, {'weight': 3}), (2, 9, {'weight': 1}), (3, 4, {'weight': 6}), (3, 5, {'weight': 2}), (3, 6, {'weight': 10}), (3, 7, {'weight': 8}), (3, 8, {'weight': 7}), (3, 9, {'weight': 6}), (4, 5, {'weight': 2}), (4, 6, {'weight': 9}), (4, 7, {'weight': 8}), (4, 8, {'weight': 5}), (4, 9, {'weight': 3}), (5, 6, {'weight': 9}), (5, 7, {'weight': 9}), (5, 8, {'weight': 3}), (5, 9, {'weight': 2}), (6, 7, {'weight': 4}), (6, 8, {'weight': 4}), (6, 9, {'weight': 10}), (7, 8, {'weight': 9}), (7, 9, {'weight': 8}), (8, 9, {'weight': 1})]
---
--- NX ---
(7, 1) {'weight': 4}
(5, 4) {'weight': 2}
(0, 3) {'weight': 1}
(2, 6) {'weight': 4}
(9, 8) {'weight': 1}
len:5 weight_sum:12
--- PYM ---
[0 3] {'weight': 1}
[1 7] {'weight': 4}
[2 6] {'weight': 4}
[4 5] {'weight': 2}
[8 9] {'weight': 1}
len:5 weight_sum:12

@gbtami
Copy link
Author

gbtami commented Oct 14, 2024

Hi @oscarhiggott

I'm sorry to not find the mentioned related issues myself, but the quantum error correction jargon usage made me careless :)
Btw, I almost forgot to mention my motivation why I started experimenting with this wonderful tool. I plan to use it for creating chess arena tournament pairings gbtami/pychess-variants#991
Thank you very much for your detailed explanation. This is exactly what I needed!

@gbtami gbtami closed this as completed Oct 14, 2024
@oscarhiggott
Copy link
Owner

Excellent! And nice to see a chess application of pymatching :)

@Strilanc
Copy link
Collaborator

Note the T-join thing is what fig1b of the sparse blossom paper is trying to communicate. Though I realize it's not really on the normal documentation path of a user.

image

@gbtami
Copy link
Author

gbtami commented Oct 14, 2024

Unfortunately this paper is not available public to anyone.
All the scientific publication should be available for free to anyone.

@Strilanc
Copy link
Collaborator

Strilanc commented Oct 14, 2024

@gbtami What are you talking about? The sparse blossom paper is on the arXiv, free for anyone to read (including no signup step): https://arxiv.org/abs/2303.15933 . This is very standard in quantum computing.

@gbtami
Copy link
Author

gbtami commented Oct 14, 2024

You are right. I'm totally blind. Sry :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants