在图中获取连接的组件答案

【问题标题】：Getting the connected components in a graph在图中获取连接的组件
【发布时间】：2016-11-16 11:47:12
【问题描述】：

我正在尝试在图表中获取所有连接的组件并将它们打印出来。我正在遍历图形的每个节点并从该节点开始执行深度优先搜索（DFS）。这是我的代码：

graph = {
'a': ['b'],
'b': ['c'],
'c': ['d'],
'd': [],
'e': ['f'],
'f': []
}

def DFS(graph, start_node, stack = [], visited = []):
    stack.append(start_node)

    while stack:
        v = stack.pop()
        if v not in visited:
            visited.append(v)
            for neighbor in graph[v]:
                stack.append(neighbor)
    return visited



def dfs_get_connected_components_util(graph):
    visited = []

    for node in graph:
        if node not in visited:
            DFS_algo = DFS(graph, node)
            print(DFS_algo)
            visited = DFS_algo

print(dfs_get_connected_components_util(graph))

根据我的图，有两个连通分量，a -> b -> c -> d 和 e -> f

相反，我得到以下打印输出：

['c', 'd']
['c', 'd', 'a', 'b']
['c', 'd', 'a', 'b', 'f']
['c', 'd', 'a', 'b', 'f', 'e']

我似乎无法弄清楚我在连接组件功能中做错了什么。我想这可能更像是一个 python 问题。

【问题讨论】：

我认为这与mutable default arguments的问题有关
这只是一个问题......但这个问题也没有很好地定义。如果您从“c”开始遍历图形，您将永远找不到“a”作为连接节点。所以你应该准确地定义你在问什么。
"Least Astonishment" and the Mutable Default Argument的可能重复
无向图有连通分量。有向图具有强连通分量。两者都是等价关系。我相信您的定义是错误的，但这与已回答的python特定编码问题无关。
感谢@KennyOstrom。我已经对此进行了澄清。我假设图表是无向的。我的解决方案没有找到强连接的组件，并且如果图形是有向的则无效。

标签： python graph-theory depth-first-search connected-components

【解决方案1】：

这就是我想出的。我添加了一些内联的 cmets 来解释我做了什么。为了清楚起见，有些东西被移到了全球范围内。我通常不建议使用全局变量。

关键是要理解递归，还要记住，当对一个对象（不是字面量）进行赋值时，你只分配引用而不是而不是它的副本。

请注意，此解决方案假定图表是无向。在下面的注释部分查看更多详细信息。

请随时要求澄清。

from collections import defaultdict

graph = {
    'a': ['b'],
    'b': ['c'],
    'c': ['d'],
    'd': [],
    'e': ['f'],
    'f': []
}

connected_components = defaultdict(set)


def dfs(node):
    """
    The key is understanding the recursion
    The recursive assumption is:
        After calling `dfs(node)`, the `connected_components` dict contains all the connected as keys,
        and the values are *the same set* that contains all the connected nodes.
    """
    global connected_components, graph
    if node not in connected_components:
        # this is important, so neighbors won't try to traverse current node
        connected_components[node] = set()
        for next_ in graph[node]:
            dfs(next_)
            # according the recursive assumption, connected_component of `next_` is also the one of `node`
            connected_components[node] = connected_components[next_]

        # all that's left is add the current node
        connected_components[node].add(node)

for node_ in graph:
    dfs(node_)


# get all connected components and convert to tuples, so they are hashable
connected_comp_as_tuples = map(tuple, connected_components.values())

# use ``set`` to make the list of connected components distinct (without repetition)
unique_components = set(connected_comp_as_tuples)
print(unique_components)

注意事项

这当然没有经过彻底测试...您应该尝试使用不同的图表（带有循环、单节点组件等）
代码可能会得到改进（在性能和清晰度方面）。例如，我们为每个节点创建一个set，即使我们真的不需要一个（当节点有邻居时，该集合是冗余的，将被覆盖）。
在 OP 的原始代码中，他使用了mutable default arguments。这是一个很大的不，不（除非你真的知道你在做什么），而且，正如上面评论的那样，可能是导致问题的原因。但这次不是……
考虑到@kenny-ostroms 对该问题的评论，关于定义的一个词（与 Python 无关）：连接组件仅与无向图有关。对于有向图，术语是强连通分量。概念是相同的——对于这样一个组件（有向或无向）中的每 2 个节点，这 2 个节点之间都有一条路径。所以即使节点“b”可以从“a”到达，如果“a”不能从“b”到达（这可能只发生在有向图中），“a”和“b”将不会共享一个连通分量。对于有向图，我的解决方案无效。该解决方案假设图可以被视为无向（换句话说，如果“b”是“a”的邻居，我们假设“a”是“b”的邻居）。

【讨论】：