Hopcroft-Karp 算法是如何工作的？答案

【问题标题】：How does the Hopcroft-Karp algorithm work?Hopcroft-Karp 算法是如何工作的？
【发布时间】：2011-09-16 00:40:39
【问题描述】：

我目前正在开展一个项目，以图形方式解释 Hopcroft-Karp 算法。

我正在使用来自Wikipedia article 的伪代码。

我还在 Stack Overflow in Python 上看到了这个算法实现

如果我不必完全理解算法就可以使用它，这将非常有用。

我的问题如下：伪代码中的Dist[]数组是什么意思，广度优先搜索中图的分层是怎么做的。我掌握了 DFS 的工作原理。

提前致谢。

【问题讨论】：

“如果我不必完全理解算法就可以使用它” - 我发现最好在使用它们之前理解它们......否则，你怎么知道它是不是合适的算法以及它可能有什么限制？
我是开玩笑的。我完全同意你所说的。为了清楚起见，我应该添加笑脸吗？
Smith：没有表现出幽默。主要是因为您的用户名是“John Smith”，rep = 1 并且您今天加入了。 SO上有很多这样的海报，而且很多都不是在开玩笑……
可以理解。我不是想变得困难，也不是在拖钓。

标签： algorithm matching

【解决方案1】：

标准BFS 创建的层使得连续层中的节点之间的距离正好为 1（即连续层的节点之间存在长度为 1 的路径）。

for v in G1
    if Pair[v] == NIL
       Dist[v] = 0
       Enqueue(Q,v)
   else
       Dist[v] = INF

因此代码初始化 BFS 树的第一层，将所有“free nodes”v（即Pair[v] == NIL）设置为距离 0。

while Empty(Q) == false
   v = Dequeue(Q)
   for each u in Adj[v]
        if Dist[ Pair[u] ] == INF
             Dist[ Pair[u] ] = Dist[v] + 1
             Enqueue(Q,Pair[u])

此代码继续逐层构建 BFS 树，对于与 v 相邻的节点 u（距离正好为 1）。

Dist[] 只是一种管理节点到 BFS 初始层的距离的方法

【讨论】：

谢谢，帮了大忙！对于任何未来的观众，我会留下一些数据并尝试从 kaveman 所说的内容中进行解释。在第一个 BFS 之后： Dist {'a': 0, 1: -1, 2: -1, 'b': 0, 4: -1, 'd': 0, 6: -1, 'f': 0 , 'c': 0, 'g': 0, 3: -1, 'e': 0, None: 1, 5: -1} 第二个（也是最后一个 BFS）之后 Dist {'a': 2, 1 : -1, 2: -1, 'b': 1, 4: -1, 'd': 0, 6: -1, 'f': 0, 'c': 2, 'g': 1, 3 : -1, 'e': 1, None: 3, 5: -1} 在数据集上：(1,"a"), (1,"b"), (2,"b"), (2 "c"), (3,"a"), (3,"c"), (4,"b"), (4,"d"), (5,"c"), (5," d"), (5, "e"), (5, "f"), (6,"f"), (6,"g")
（达到字符限制）；因此，从 kaveman 所说，似乎所有“字母”节点（集 U）都是第一层（由 dist 值 0 表示）。在第一阶段之后，a、b、c、e和g被匹配。我们现在看到 d 和 f 构成了下一阶段的第一层。
对于if Dist[ Pair[u] ] == INF 行，这是否等同于if Pair[u] != NIL？因为Pair[u]是它匹配到的左侧节点v，所以Dist[v]总是INF，因为我们知道左侧所有匹配的节点在第一次访问时都有Dist[v] = INF？

【解决方案2】：

这是我为 Hopcroft Karp 算法编写的完整注释代码。下面带注释的代码有望帮助解释这个漂亮算法的几乎每一个角落。它在 C# 中，但代码可以很容易地转换为 C++/Java。您还可以在my Github repo 上找到测试用例。

注意：我还在Wikipedia article 中添加了此解释的摘要版本和下图。

/// <summary>
/// Implements Hopcroft Karp algorithm for finding maximum matching for unweighted undirected bipartiate graphs
/// https://en.wikipedia.org/wiki/Hopcroft%E2%80%93Karp_algorithm
/// </summary>
public static class HopcroftKarp
{
    /// <summary>
    /// Finds maximum matching for unweighted bipartiate graph in O(E sqrt(v)).
    /// The Hopcroft Karp algorithm is as follows:
    /// 1.  Augmented path has end points as two unmatched vertices. 
    ///     There may be 0 or more other vertices in the path besides source and destination however they must all be matched vertices.
    ///     The edges in the path must alternate between unmatched edge and matched edge. As the first vertice in path is unmatched, first edge is
    ///     automatically unmatched. If next vertex is unmatched as well then path ends otherwise we are on matched vertex in which case we must follow
    ///     edge from previous matching. After we follow the matched edge, we would be on other side but now we can not have any other matched edge so 
    ///     automatically, next edge must be unmatched to go on opposite side. And so the path continues.
    /// 2. Shortest augmented path is just that: Shortest in length of all augmented paths in graph.
    /// 3. There may be multiple shortest augmented paths of the same length.
    /// 4. Hopcroft Karp algorithm requires that we construct the maximal set of shortest augmented paths that don't have any common vertex between them.
    /// 5. Then so symmetric difference of existing matching and all augmented paths to get the new matching.
    /// 6. Repeat until no more augmented paths are found.
    /// 
    /// How do we do #4?
    /// 
    /// Let our graph be two partitions U, V. The key idea is to add a dummy vertex vDummy and connect it to unmatched vertices in V. Similarly add uDummy and 
    /// connect it to all unmatched vertices in U. Now if we run BFS from uDummy to vDummy then we can get shortest path between an unmatched vertex in U to unmatched vertex in V. 
    /// Due to bi-partiate nature of the graph, this path would zig zag from U to V. However we need to make sure that when going from V to U, we always select matched edge. If there
    /// is no matched edge then we abandon further search on that path and continue exploring other paths. If we make sure of this one condition then it meets all criteria for path being an
    /// augmented path as well as the shortest path.
    /// 
    /// Once we have found the shortest path, we want to make sure we ignore any paths that are longer than this shortest paths. BFS algorithm marks nodes in path with
    /// distance with source being 0. Thus, after doing BFS, we can start at each unmatched vertex in U, follow the path by following nodes with distance that increments by 1. 
    /// By the the time we arrive at destination vDummy, if its distance is 1 more than last node in V then we know that the path we followed is the shortest path. In that case,
    /// we just update the matching for each vertex on path in U and V. Note that each vertex in V on path, except for the last one, is matched vertex. 
    /// So updating this matching starting from first unmatched vertex is equivalent to removing previously matched edge and adding new unmatched edge in matching. This is same
    /// is symmetric difference (i.e. remove edges common to previous matching and add non-common edges in augmented path in new matching).
    /// 
    /// How do we make sure augmented paths are vertex disjoint? After we consider each augmented path, reset distance of vertices in path (that was originally set by BFS)
    /// to infinity. This prevents any other augmenting path with same vertice(s) getting considered.
    /// 
    /// Finally, we actually don't need uDummy because it's there just to put all unmatched vertices in queue when BFS starts. That we can do as just initialization. The 
    /// vDummy can be appened in U for convinience however we don't really need to add edges from V to vDummy. Instead, we can just initialize default pairing for all V
    /// to point to vDummy. That way, if final vertex in V doesn't have any matching vertex in U then we end at vDummy which is end of our augmented path. In below code,
    /// vDummy is denoted as iNil.
    /// 
    /// </summary>
    /// <param name="gU">Left vertices, each with edges to right vertices</param>
    /// <param name="gVCount">Count of right side of vertices</param>
    /// <returns>For each left vertice, corresponding ID for the matched right vertice</returns>
    public static IList<int> GetMatching(IList<int[]> gU, int gVCount)
    {
        //Recreate gU with dummy NIL node
        gU = new List<int[]>(gU);
        gU.Add(new int[] {});

        var iNil = gU.Count - 1; //Index of dummy node
        var gUCount = iNil; //Count of nodes in U without dummy node

        //Create lists that would hold matching, by default they point to NIL
        var pairU = Enumerable.Repeat(iNil, gUCount).ToList();
        var pairV = Enumerable.Repeat(iNil, gVCount).ToList();

        var dist = new int[gU.Count];   //Array to store distances from BFS
        var q = new int[gU.Count];      //Array to use as queue during BFS

        //The modified version of BFS marks shortest augmented paths available using dist array
        //When no more paths are available it returns false
        //Note that we need to find augmented paths starting from either only U or V because 
        //that paths ends on other side anyway.
        while (Bfs(gU, pairU, pairV, dist, iNil, gUCount, q))
        {
            //For each unmatched vertex in U
            for (var u = 0; u < gUCount; ++u)
            {
                if (pairU[u] == iNil)
                {
                    //See if we have shotest augmnted path. If we do, then
                    //Dfs will also mark vertices along that path unusable for 
                    //any other paths in next calls so that paths remain vertex disjoint.
                    //Also if Dfs does find shortest augmnted path then it will do symmetric
                    //difference along that path, setting proper pairs in pairU and pairV
                    Dfs(gU, pairU, pairV, dist, iNil, u);
                }
            }
        }

        //Return pairs for U, converting iNil to -1 to indicate no pairing and also eliminate last dummy vertex 
        return pairU.Select(i => i == iNil ? -1 : i).Take(gUCount).ToList();
    }

    /// <summary>
    /// Finds all available shortest augmented paths in the graph
    /// </summary>
    /// <param name="gU">Adjecency list of U</param>
    /// <param name="pairU">Matching so far for U</param>
    /// <param name="pairV">Matching so far for V (should be consistant with pairU)</param>
    /// <param name="dist">dist array marks the next vertex in shortest path from previous vertex</param>
    /// <param name="iNil">Which vertice is our dummy node</param>
    /// <param name="gUCount">This is just alias for iNil for better readability</param>
    /// <param name="q">The array to be used as queue during BFS. It's just passed from outside to avoid recreating it during each call.</param>
    /// <returns>true if atleast one augmented path exists else false</returns>
    private static bool Bfs(IList<int[]> gU, IReadOnlyList<int> pairU, IReadOnlyList<int> pairV, IList<int> dist, int iNil, int gUCount, IList<int> q)
    {
        //Pointers in queue
        int qiEnqueue = 0, qiDequeue = 0;

        //For each vertex in U, initialize distance to 0 if it was unmatched fo far
        //We will find shortest augmented paths from each unmatched vertices
        for (var u = 0; u < gUCount; ++u)
        {
            if (pairU[u] == iNil)
            {
                dist[u] = 0;

                //enqueue all unmatched vertices in queue
                q[qiEnqueue++] = u;
            }
            else
                dist[u] = int.MaxValue;
        }

        //Set distance of dummy node to infinite. When we find the shortest path, we would end at dummy node and BFS
        //would set its distance to the length of shortest path. We can use this length to eliminate any path that are 
        //longer than. If more than one vertex has same length shortest path then for both of them we can follow the 
        //dist array all the way to dummy node and when we get there we can check that length of path so far is the 
        //same value as in dist[iNil].
        dist[iNil] = int.MaxValue;

        //While queue is not empty
        while (qiDequeue < qiEnqueue)
        {
            //Dequeue next node
            var u = q[qiDequeue++];

            //If length of path to this node has exceed the shortest path
            //we already found then ignore this path and move on to next node
            if (dist[u] < dist[iNil])
            {
                //Go through our adjecency list of this node and
                //see if any of our neighbours for this node in V has an existing match in U OR is unmatched.
                //If it has existing matching then we will travel to next node in U and repeat the process.
                //If there was no existing matching then nextU is dummy node. In that case, if we are getting there
                //for the first time then we have found the shortest path and we mark dummy nodes dist as length of 
                //shortest path we found. If it isn't our first time then either we have another shortest path of same
                //length or it's a longer path. In either case we can just ignore because if it was another shortest path
                //then dist of dummy node is already marked correctly. If it was not then dist of dummy node will be less than
                //dist[u] and it would ignored by Dfs.
                foreach (var v in gU[u])
                {
                    var nextU = pairV[v];
                    if (dist[nextU] == int.MaxValue)
                    {
                        dist[nextU] = dist[u] + 1;

                        //Note that queue will always contain vertices from U. We don't need dist array for V. This is 
                        //because from vertex in U we always go to next vertex in U (or dummy node).
                        q[qiEnqueue++] = nextU;
                    }
                }
            }
        }

        //If we every found a shortest path then dist[iNil] would contain length of the shortest path
        return dist[iNil] != int.MaxValue;
    }

    /// <summary>
    /// Uses values set in dist array to traverse the path from each unmatched node in U. If we arrive at iNil 
    /// and if our path length matches dist[iNil] then we have one of the shortest paths. In that case, we do
    /// symmetric difference on existing matching with shortest path.
    /// </summary>
    /// <param name="gU">Adjecency list of U</param>
    /// <param name="pairU">Matching so far for U</param>
    /// <param name="pairV">Matching so far for V (should be consistant with pairU)</param>
    /// <param name="dist">dist array marks the next vertex in shortest path from previous vertex</param>
    /// <param name="iNil">Which vertice is our dummy node</param>
    /// <param name="u">Node in U from which we need to find shortest augmented path</param>
    /// <returns>Returns true if shortest augmented path was found from u</returns>
    private static bool Dfs(IList<int[]> gU, IList<int> pairU, IList<int> pairV, IList<int> dist, int iNil, int u)
    {
        //Recursion termination: If we arrive at dummy node during traversal then we have shortest augmented path and so terminate.
        if (u != iNil)
        {
            //For each neighbours of u, see if it's next node in the possible path
            foreach (var v in gU[u])
            {
                var nextU = pairV[v];

                //The neigbour is next node in path if it's matching node is our distance + 1
                if (dist[nextU] == dist[u] + 1)
                {
                    //Recursively see if for this next node in path, we have shortest augmented path available
                    if (Dfs(gU, pairU, pairV, dist, iNil, nextU))
                    {
                        //If so then time to do symmetric difference! Note that pairV[u] either has previous matching or is unmatched.
                        //If it had previous matching then setting pairV[v] to new value removes that matching and then adds a new matching
                        //which in essence is symmetric difference.
                        pairU[u] = v;
                        pairV[v] = u;
                        return true;
                    }
                }
            }

            //Mark our node unusable for getting included in any other paths so that all shortest augmented paths are
            //vertex disjoint. For bipartiate case (but not for general graphs) it can be proved that doing this simple
            //vertex elimination results in maximal set of vertex disjoint paths. Why not for general graphs? Imagine
            //5 paths horizontally and one path vertical that cuts across the 5. If we choose vertices of vertical path then
            //we need to eliminate all horizontal paths resulting in non-maximal set.
            dist[u] = int.MaxValue;
            return false;
        }
        return true;
    }

}

【讨论】：