【问题标题】:Efficiently extract edges with biggest weight for each node igraph为每个节点 igraph 有效地提取权重最大的边
【发布时间】:2022-01-21 17:57:29
【问题描述】:

我有一个加权无向简单图。
节点数在数万左右,边数也是如此(它是一个稀疏图/矩阵)。

对于每个节点,我想找到最大边权重(“最大分数”)并将其与共享该边的节点一起存储在数据框中。数据框将包含三列:node_name - str、max_score - float [0-1]、max_score_nodes - List[str]

下面提供了我当前的解决方案,但它并不优雅,在 for 循环中有多个列表解析(其中一个是嵌套的),检查没有边的节点等,我觉得有一种更聪明的方法来做到这一点。

import string
import igraph as ig
import numpy as np
import pandas as pd

nodes = list(string.ascii_letters[0:6].upper())
edges = [("A", "F"), ("A", "C"), ("F", "D"), ("D", "C"), ("D", "E")]
weights = [0.6, 0.4, 0.3, 0.9, 0.9]

w_graph = ig.Graph(directed=False)
w_graph.add_vertices(nodes)
w_graph.add_edges(edges, {"weight": weights})

records = {}
for node in nodes:
    local_edges = np.array(w_graph.vs.find(node).all_edges())
    if local_edges.size == 0:
        records.update({node: {"max_score": 0, "max_score_nodes": np.nan}})
        continue
    local_weights = [local_edge["weight"] for local_edge in local_edges]

    max_score = np.max(local_weights)
    max_score_ind = np.where(local_weights == max_score)[0]
    max_score_edges = local_edges[max_score_ind]

    vertex_tuples = [edge.vertex_tuple for edge in max_score_edges]
    max_score_nodes = [
        [vertex["name"] for vertex in vertex_tuple if vertex["name"] != node][0] for
        vertex_tuple in vertex_tuples]

    records.update({node: {"max_score": max_score, "max_score_nodes": max_score_nodes}})

output = pd.DataFrame.from_dict(records, orient="index")
output_with_node_name = output.rename_axis("node_name").reset_index()
print(output_with_node_name)

【问题讨论】:

    标签: python pandas scipy igraph


    【解决方案1】:

    您可以像这样构建图表:

    import igraph as ig
    from igraph import Graph
    
    g = Graph.Formula('A, B, C, D, E, F, A-F, A-C, F-D-C, D-E')
    g.es['weight'] = [0.6, 0.4, 0.3, 0.9, 0.9]
    

    获取每个顶点权重最高边的索引:

    strongest_edges = [max(edges, key=g.es['weight'].__getitem__) for edges in g.get_inclist()]
    

    获取每条边的端点:

    [g.es[eid].tuple for eid in strongest_edges]
    

    【讨论】:

    • 感谢您的回答,它教会了我新的 igraph 方法,所以 + 1 但目前它不适用于我的示例,因为您的图缺少没有边的节点 B。同样在最终输出中,而不是具有两个节点 ID 的列表元组中,我想要一个具有 max_weight 和单个节点名称的字典列表(如 output_with_node_name 所示),因此最后一个列表理解需要修改
    【解决方案2】:

    我发现了 SciPy 的 csr matrix,它可以有效地处理压缩行格式的稀疏矩阵。

    有一个 igraph method 将图形的邻接矩阵作为 CSR 矩阵返回。

    从这里我们可以对行进行向量化操作,因此整个嵌套的 for 循环现在减少为 4 行:

    scores_matrix = w_graph.get_adjacency_sparse(attribute="weight")
    
    node_names = w_graph.vs["name"]
    max_scores = scores_matrix.max(axis=0).toarray()[0]
    max_score_nodes = w_graph.vs[scores_matrix.argmax(axis=0).tolist()[0]]["name"]
    max_score_nodes = np.where(max_scores == 0, np.nan, max_score_nodes)
    
    output = pd.DataFrame({"node_name": node_names,
                           "max_score": max_scores,
                           "max_score_nodes": max_score_nodes})
    print(output)
    

    问题是argmax 将只返回一个索引,如果你想让所有节点名称都带有max_score,你可以像这样循环:

    from scipy.sparse import find
    
    max_score_nodes = []
    for node_index in w_graph.vs.indices:
        max_score = max_scores[node_index]
        if max_score == 0:
            max_score_nodes.append(np.nan)
            continue
    
        max_score_node_ind = find(scores_matrix.getrow(node_index) == max_score)[1]
        max_score_node_names = w_graph.vs[max_score_node_ind.tolist()]["name"]
        max_score_nodes.append(max_score_node_names)
    
    output["max_score_nodes"] = max_score_nodes
    print(output)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-04-10
      • 2017-09-21
      • 1970-01-01
      • 2020-01-17
      • 1970-01-01
      • 2016-02-23
      • 2023-01-20
      • 2022-01-02
      相关资源
      最近更新 更多