【问题标题】:Algorithm to partition graph into complete subgraphs将图划分为完整子图的算法
【发布时间】:2020-03-25 18:00:11
【问题描述】:

我需要一种算法来将无向图的顶点划分为一个或多个子图,这样每个子图都是一个完整的图(每个顶点都与其他每个顶点相邻)。每个顶点都需要恰好位于其中一个子图中。

这是一个例子:

input = [
    (A, B),
    (B, C),
    (A, C),
    (B, D),
    (D, E),
]
output = myalgo(input)  # [(A, B, C), (D, E)]

这里有一张更能描述问题的图片:

输入列表按距离降序排列,这就是我连接 A-B-C 而不是 B-D 的原因。

我认为这可能被称为“强连接组件”,并且已经尝试了以下解决方案:

【问题讨论】:

  • “强连接”是指“组件内的所有元素必须直接相互链接”?
  • @LukasThaler 是的
  • 听起来很有趣。让我看看我能想出什么:D

标签: python algorithm graph data-science


【解决方案1】:

这是一个实现分割成完整子图的类。它绝不是经过优化的,可能会得到显着改进,但这是一个起点

class SCCManager:
    def __init__(self, edges):
        self.clusters = []
        self.edges = edges

    def clusters_in(self, conn):
        first, second = conn
        f_clust = None
        s_clust = None
        for i, clust in enumerate(self.clusters):
            if first in clust:
                f_clust = i
            if second in clust:
                s_clust = i
            # break early if both already found
            if f_clust and s_clust:
                break
        return (f_clust, s_clust)

    def all_connected(self, cluster, vertex):
        for v in cluster:
            connected = (v, vertex) in self.edges or (vertex, v) in self.edges
            # break early if any element is not connected to the candidate
            if not connected:
                return False
        return True

    def get_scc(self):
        for edge in self.edges:
            c_first, c_second = self.clusters_in(edge)

            # case 1: none of the vertices are in an existing cluster
            # -> create new cluster containing the vertices
            if c_first == c_second == None:
                self.clusters.append([edge[0], edge[1]])
                continue

            # case 2: first is in a cluster, second isn't
            # -> add to cluster if eligible
            if c_first != None and c_second == None:
                # check if the second is connected to all cluster components
                okay = self.all_connected(self.clusters[c_first], edge[1])
                # add to cluster if eligible
                if okay:
                    self.clusters[c_first].append(edge[1])
                continue

            # case 3: other way round
            if c_first == None and c_second != None:
                okay = self.all_connected(self.clusters[c_second], edge[0])
                if okay:
                    self.clusters[c_second].append(edge[0])
                continue

            # case 4: both are in different clusters
            # -> merge clusters if allowed
            if c_first != c_second:
                # check if clusters can be merged
                for v in self.clusters[c_first]:
                    merge = self.all_connected(self.clusters[c_second], v)
                    # break if any elements are not connected
                    if not merge:
                        break
                # merge if allowed
                if merge:
                    self.clusters[c_first].extend(self.clusters[c_second])
                    self.clusters.remove(self.clusters[c_second])

            # case 5: both are in the same cluster
            # won't happen if input is sane, but doesn't require an action either way


        return self.clusters

...这是一个工作示例:

inp = [
    ('A', 'B'),
    ('B', 'C'),
    ('A', 'C'),
    ('B', 'D'),
    ('D', 'E'),
    ('C', 'E')
]

test = SCCManager(inp)
print(test.get_scc())

[['A', 'B', 'C'], ['D', 'E']]

【讨论】:

  • 假设输入是一个删除了一条边的大集团。正确答案是所有将缺失边的顶点分开的顶点分区。你更喜欢哪一个?
  • 这是由用户输入决定的。 OP 在他们的问题中提到,边缘的顺序是在问题中给出的。如果上图中的 (B,D) 边是第一个输入的,我们最终会得到分量 (A,C) 和 (B,D),(和 (E),如果你想考虑的话一个组件,我现在的实现还没有)
【解决方案2】:

您可以找到所有路径,然后按连通性分组:

from itertools import groupby as gb
d = [('A', 'B'), ('B', 'C'), ('A', 'C'), ('B', 'D'), ('D', 'E')]
def connect_num(node):
    return [sum(a == node for a, _ in d), sum(b == node for _, b in d)]

def group_paths(data):
   new_d = sorted([[i, connect_num(i)] for i in data], key=lambda x:max(x[1]))
   return [[k for k, _ in b] for _, b in gb(new_d, key=lambda x:max(x[1]))]

def get_paths(start, c = [], seen = []):
   new_vals = [a for a, _ in d if a not in seen+c]
   if (vals:=[b for a, b in d if a == start]):
      for i in vals:
         yield from get_paths(i, c=c+vals, seen=seen)
   else:
      yield c
      for i in new_vals:
         yield from get_paths(i, c = [i], seen=c+seen)

result = sorted(map(set, get_paths(d[0][0])), key=len, reverse=True)
new_result = [a for i, a in enumerate(result) if not any(all(k in c for k in a) for c in result[:i])]
final_result = [group_paths(i) for i in new_result]

输出:

#final_result[0]
[['E', 'D'], ['A', 'C', 'B']]

【讨论】:

    【解决方案3】:

    另一个尝试:

    lst = [
        ('A', 'B'),
        ('B', 'C'),
        ('A', 'C'),
        ('B', 'D'),
        ('D', 'E'),
    ]
    
    d = {}
    for i, j in lst:
        d.setdefault(i, []).append(j)
        d.setdefault(j, []).append(i)
    
    from itertools import combinations
    
    rv, seen_segments, seen_vertices = [], set(), set()
    for k, v in d.items():
        if len(v) == 1:
            segment = set((k, v[0])).difference(seen_vertices)
            seen_vertices.update(segment)
            rv.append([tuple(segment), ])
        else:
            graph = []
            for i, j in combinations([k] + v, 2):
                if not j in d[i]:
                    break
                else:
                    graph.append(tuple(sorted((i, j))))
            else:
                if graph:
                    graph = [segment for segment in graph if segment not in seen_segments]
                    seen_segments.update(graph)
                    if graph:
                        rv.append(graph)
    
    from pprint import pprint
    pprint(rv)
    

    打印:

    [[('A', 'B'), ('A', 'C'), ('B', 'C')], [('D', 'E')]]
    

    输入

    lst = [
        ('A', 'B'),
        ('B', 'C'),
    ]
    

    打印:

    [[('A', 'B')], [('C',)]]
    

    输入:

    lst = [
        ('A', 'B'),
        ('B', 'C'),
        ('C', 'D'),
    ]
    

    打印:

    [[('B', 'A')], [('D', 'C')]]
    

    【讨论】:

      【解决方案4】:
      from collections import defaultdict
      
      
      def create_adjacency_matrix(connections):
          matrix = defaultdict(dict)
          for a, b in connections:
              matrix[a][b] = 1
              matrix[b][a] = 1
          return matrix
      
      
      def is_connected_to_all(vertex, group, matrix):
          for v in group:
              if vertex != v and vertex not in matrix[v]:
                  return False
          return True
      
      
      def group_vertexes(input):
          matrix = create_adjacency_matrix(input)
          groups = []
          current_group = set()
          for vertex in matrix.keys():
              if is_connected_to_all(vertex, current_group, matrix):
                  current_group.add(vertex)
              else:
                  groups.append(current_group)
                  current_group = {vertex}
          groups.append(current_group)
          return groups
      
      input = [("A", "B"), ("B", "C"), ("A", "C"), ("B", "E"), ("E", "D")]
      print(group_vertexes(input))
      # [{'C', 'B', 'A'}, {'D', 'E'}]
      

      警告:这依赖于 dict 在 python 3.7+ 中保持插入顺序的事实。在旧版本中,您必须使用 matrix = DefaultOrderedDict(dict) https://stackoverflow.com/a/35968897/9392216

      【讨论】:

      • 矩阵是个好主意,但是如果我在输入中添加另一个 ('C', 'E'),算法就会中断。我认为更新逻辑需要更健壮一些才能处理更复杂的输入
      • 你能否提供整个错误的输入,因为我测试过它并且它在当前输入结束时适用于 ('C','E')。
      • inp = [('A', 'B'),('B', 'C'),('A', 'C'),('B', 'D'),('D', 'E'),('C', 'E')] 我的结果是 [{'B', 'C', 'A'}, {'E', 'D'}, {'E', 'C'}]
      • 实际上在 3.7 上运行。这个输入你会得到什么输出?
      • 是的,你是对的:P。我正在使用for vertex in matrix.keys(): 运行修改版本,这是我的答案中的一个替代选项。我修改了第一个以使用正确的逻辑。
      猜你喜欢
      • 1970-01-01
      • 2017-10-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-06-09
      • 2019-08-08
      • 1970-01-01
      相关资源
      最近更新 更多