给定表示对的 n 个元组，返回一个包含连接元组的列表答案

【问题标题】：Given n tuples representing pairs, return a list with connected tuples给定表示对的 n 个元组，返回一个包含连接元组的列表
【发布时间】：2015-03-11 07:26:59
【问题描述】：

给定 n 个元组，编写一个函数，该函数将返回一个包含连接值的列表。

例子：

pairs = [(1,62),
    (1,192),
    (1,168),
    (64,449),
    (263,449),      
    (192,289),
    (128,263),
    (128,345),
    (3,10),
    (10,11)
    ]

结果：

[[1,62,192,168,289],
 [64,449,263,128,345,449],
 [3,10,11]]

我相信它可以使用图或树作为数据结构来解决，为每个值创建节点，并为每对创建边，每个树或图表示连接的值，但我还没有找到解决方案。

在 python 中产生一个为这些对产生一个连接值列表的结果的最佳方法是什么？

【问题讨论】：

如果有循环怎么办？
每个元组总是有两个元素吗？
所以它们都包含在同一个集合中
@thefourtheye，如果有循环，结果将是 [1,62,192,168,289, 64,449,263,128,345,449,3,10,11]
@BBrown，是的。每个元组总是两个元素

标签： python data-structures graph

【解决方案1】：

您可以通过Disjoint Set (Union-Find) 实现来解决它。

用所有数字初始化结构djs。然后对于每个元组(x,y)，调用djs.merge(x,y)。现在对于每个数字x，为它创建一个新的集合，如果djs.sameSet(x,)==false 为每个现有集合中的任意y。

也许that 可以帮助你。

【讨论】：

第二步可以简单地将不相交的集合提取为列表并将它们转换为元组。

【解决方案2】：

我不知道这个问题已经有了名字（感谢avim！），所以我继续天真地解决了它。

这个解决方案有点类似于 Eli Rose 的解决方案。不过我决定发布它，因为它对于大型对列表更有效，因为lists_by_element 字典跟踪元素所在的列表，允许我们避免遍历所有列表和每次我们需要添加新项目时他们的项目。

代码如下：

def connected_tuples(pairs):
    # for every element, we keep a reference to the list it belongs to
    lists_by_element = {}

    def make_new_list_for(x, y):
        lists_by_element[x] = lists_by_element[y] = [x, y]

    def add_element_to_list(lst, el):
        lst.append(el)
        lists_by_element[el] = lst

    def merge_lists(lst1, lst2):
        merged_list = lst1 + lst2
        for el in merged_list:
            lists_by_element[el] = merged_list

    for x, y in pairs:
        xList = lists_by_element.get(x)
        yList = lists_by_element.get(y)

        if not xList and not yList:
            make_new_list_for(x, y)

        if xList and not yList:
            add_element_to_list(xList, y)

        if yList and not xList:
            add_element_to_list(yList, x)            

        if xList and yList and xList != yList:
            merge_lists(xList, yList)

    # return the unique lists present in the dictionary
    return set(tuple(l) for l in lists_by_element.values())

它是这样工作的：http://ideone.com/tz9t7m

【讨论】：

【解决方案3】：

您也可以使用 networkx 作为依赖项。

import networkx as nx

pairs = [(1,62),
        (1,192),
        (1,168),
        (64,449),
        (263,449),      
        (192,289),
        (128,263),
        (128,345),
        (3,10),
        (10,11)]


G = nx.Graph()
G.add_edges_from(pairs)
list(nx.connected_components(G))

【讨论】：

【解决方案4】：

另一个比 wOlf 更紧凑但处理与 Eli 相反的合并的解决方案：

def connected_components(pairs):
    components = []
    for a, b in pairs:
        for component in components:
            if a in component:
                for i, other_component in enumerate(components):
                    if b in other_component and other_component != component: # a, and b are already in different components: merge
                        component.extend(other_component)
                        components[i:i+1] = []
                        break # we don't have to look for other components for b
                else: # b wasn't found in any other component
                    if b not in component:
                        component.append(b)
                break # we don't have to look for other components for a
            if b in component: # a wasn't in in the component 
                component.append(a)
                break # we don't have to look further
        else: # neither a nor b were found
            components.append([a, b])
    return components

请注意，当我在组件中找到一个元素时，我依赖于打破循环，以便我可以使用循环的 else 子句来处理元素尚未在任何组件中的情况（@987654323如果循环结束时没有break，则执行@）。

【讨论】：

【解决方案5】：

我想出了两种不同的解决方案：

我更喜欢的第一个是将每条记录与父母联系起来。然后当然在层次结构中进一步导航，直到元素映射到自身。

所以代码是：

def build_mapping(input_pairs):
    mapping = {}

    for pair in input_pairs:
        left = pair[0]
        right = pair[1]

        parent_left = None if left not in mapping else mapping[left]
        parent_right = None if right not in mapping else mapping[right]

        if parent_left is None and parent_right is None:
            mapping[left] = left
            mapping[right] = left

            continue

        if parent_left is not None and parent_right is not None:
            if parent_left == parent_right:
                continue

            top_left_parent = mapping[parent_left]
            top_right_parent = mapping[parent_right]
            while top_left_parent != mapping[top_left_parent]:
                mapping[left] = top_left_parent
                top_left_parent = mapping[top_left_parent]

            mapping[top_left_parent] = top_right_parent
            mapping[left] = top_right_parent

            continue 

        if parent_left is None:
            mapping[left] = parent_right
        else:
            mapping[right] = parent_left

    return mapping


def get_groups(input_pairs):
    mapping = build_mapping(input_pairs)

    groups = {}
    for elt, parent in mapping.items():
        if parent not in groups:
            groups[parent] = set()

        groups[parent].add(elt)

    return list(groups.values())

因此，输入以下内容：

groups = get_groups([('A', 'B'), ('A', 'C'), ('D', 'A'), ('E', 'F'), 
                     ('F', 'C'), ('G', 'H'), ('I', 'J'), ('K', 'L'), 
                     ('L', 'M'), ('M', 'N')])

我们得到：

[{'A', 'B', 'C', 'D', 'E', 'F'}, {'G', 'H'}, {'I', 'J'}, {'K', 'L', 'M', 'N'}]

第二种可能效率较低的解决方案是：

def get_groups_second_method(input_pairs):
    groups = []

    for pair in input_pairs:
        left = pair[0]
        right = pair[1]

        left_group = None
        right_group = None
        for i in range(0, len(groups)):
            group = groups[i]

            if left in group:
                left_group = (group, i)

            if right in group:
                right_group = (group, i)

        if left_group is not None and right_group is not None:
            merged = right_group[0].union(left_group[0])
            groups[right_group[1]] = merged
            groups.pop(left_group[1])
            continue

        if left_group is None and right_group is None:
            new_group = {left, right}
            groups.append(new_group)
            continue

        if left_group is None:
            right_group[0].add(left)
        else:
            left_group[0].add(right)

    return groups

【讨论】：

【解决方案6】：

您似乎有一个图（以边列表的形式），它可能不是一个整体（“连接”），并且您想将它分成多个部分（“组件”）。

一旦我们用图表的语言来思考它，我们就大功告成了。我们可以保留到目前为止我们找到的所有组件的列表（这些将是节点集），如果它的伙伴已经存在，则将一个节点添加到该集合中。否则，为这对创建一个新组件。

def graph_components(edges):
    """
    Given a graph as a list of edges, divide the nodes into components.

    Takes a list of pairs of nodes, where the nodes are integers.
    Returns a list of sets of nodes (the components).
    """

    # A list of sets.
    components = []

    for v1, v2 in edges:
        # See if either end of the edge has been seen yet.
        for component in components:
            if v1 in component or v2 in component:
                # Add both vertices -- duplicates will vanish.
                component.add(v1)
                component.add(v2)
                break
        else:
            # If neither vertex is already in a component.
            components.append({v1, v2})

    return components

为了实现这一功能，我使用了奇怪的for ... else 构造——如果在for 期间未达到break 语句，则else 将被执行。内部循环也可以是返回True 或False 的函数。

编辑：正如 Francis Colas 指出的那样，这种方法太贪婪了。这是一种完全不同的方法（非常感谢 Edward Mann 为 this 漂亮的 DFS 实现）。

这种方法基于构建一个图，然后对其进行遍历，直到我们用完未访问的节点。它应该在线性时间内运行（O(n) 来构建图形，O(n) 来完成所有的遍历，我相信 O(n) 只是为了做集合的差异）。

from collections import defaultdict

def dfs(start, graph):
    """
    Does depth-first search, returning a set of all nodes seen.
    Takes: a graph in node --> [neighbors] form.
    """
    visited, worklist = set(), [start]

    while worklist:
        node = worklist.pop()
        if node not in visited:
            visited.add(node)
            # Add all the neighbors to the worklist.
            worklist.extend(graph[node])

    return visited

def graph_components(edges):
    """
    Given a graph as a list of edges, divide the nodes into components.
    Takes a list of pairs of nodes, where the nodes are integers.
    """

    # Construct a graph (mapping node --> [neighbors]) from the edges.
    graph = defaultdict(list)
    nodes = set()

    for v1, v2 in edges:
        nodes.add(v1)
        nodes.add(v2)

        graph[v1].append(v2)
        graph[v2].append(v1)

    # Traverse the graph to find the components.
    components = []

    # We don't care what order we see the nodes in.
    while nodes:
        component = dfs(nodes.pop(), graph)
        components.append(component)

        # Remove this component from the nodes under consideration.
        nodes -= component

    return components

【讨论】：

这正是我要发布的解决方案，除了我没有使用for..else 而是使用一个无聊的布尔标志来指示是否应将新集添加到集列表中。干得好！
这是一个非常不错且简单的解决方案！
嗯，它既好又简单，但它不会合并组件（例如，[(1, 2), (3, 4), (2, 4)] 将返回 [{1, 2, 4}, {3, 4, 2}] 而不是 [{1, 2, 3, 4}]）。
我在下面提出了一个使用列表而不是集合的解决方案（但两者都应该可以）；它更复杂，因为我们要处理这种情况。
确实如此。我做了一些额外的测试，可能会出现项目出现在不同的组件中。