为每个节点 igraph 有效地提取权重最大的边答案

【问题标题】：Efficiently extract edges with biggest weight for each node igraph为每个节点 igraph 有效地提取权重最大的边
【发布时间】：2022-01-21 17:57:29
【问题描述】：

我有一个加权无向简单图。
节点数在数万左右，边数也是如此（它是一个稀疏图/矩阵）。

对于每个节点，我想找到最大边权重（“最大分数”）并将其与共享该边的节点一起存储在数据框中。数据框将包含三列：node_name - str、max_score - float [0-1]、max_score_nodes - List[str]

下面提供了我当前的解决方案，但它并不优雅，在 for 循环中有多个列表解析（其中一个是嵌套的），检查没有边的节点等，我觉得有一种更聪明的方法来做到这一点。

import string
import igraph as ig
import numpy as np
import pandas as pd

nodes = list(string.ascii_letters[0:6].upper())
edges = [("A", "F"), ("A", "C"), ("F", "D"), ("D", "C"), ("D", "E")]
weights = [0.6, 0.4, 0.3, 0.9, 0.9]

w_graph = ig.Graph(directed=False)
w_graph.add_vertices(nodes)
w_graph.add_edges(edges, {"weight": weights})

records = {}
for node in nodes:
    local_edges = np.array(w_graph.vs.find(node).all_edges())
    if local_edges.size == 0:
        records.update({node: {"max_score": 0, "max_score_nodes": np.nan}})
        continue
    local_weights = [local_edge["weight"] for local_edge in local_edges]

    max_score = np.max(local_weights)
    max_score_ind = np.where(local_weights == max_score)[0]
    max_score_edges = local_edges[max_score_ind]

    vertex_tuples = [edge.vertex_tuple for edge in max_score_edges]
    max_score_nodes = [
        [vertex["name"] for vertex in vertex_tuple if vertex["name"] != node][0] for
        vertex_tuple in vertex_tuples]

    records.update({node: {"max_score": max_score, "max_score_nodes": max_score_nodes}})

output = pd.DataFrame.from_dict(records, orient="index")
output_with_node_name = output.rename_axis("node_name").reset_index()
print(output_with_node_name)

【问题讨论】：

标签： python pandas scipy igraph

【解决方案1】：

您可以像这样构建图表：

import igraph as ig
from igraph import Graph

g = Graph.Formula('A, B, C, D, E, F, A-F, A-C, F-D-C, D-E')
g.es['weight'] = [0.6, 0.4, 0.3, 0.9, 0.9]

获取每个顶点权重最高边的索引：

strongest_edges = [max(edges, key=g.es['weight'].__getitem__) for edges in g.get_inclist()]

获取每条边的端点：

[g.es[eid].tuple for eid in strongest_edges]

【讨论】：

感谢您的回答，它教会了我新的 igraph 方法，所以 + 1 但目前它不适用于我的示例，因为您的图缺少没有边的节点 B。同样在最终输出中，而不是具有两个节点 ID 的列表元组中，我想要一个具有 max_weight 和单个节点名称的字典列表（如 output_with_node_name 所示），因此最后一个列表理解需要修改

【解决方案2】：

我发现了 SciPy 的 csr matrix，它可以有效地处理压缩行格式的稀疏矩阵。

有一个 igraph method 将图形的邻接矩阵作为 CSR 矩阵返回。

从这里我们可以对行进行向量化操作，因此整个嵌套的 for 循环现在减少为 4 行：

scores_matrix = w_graph.get_adjacency_sparse(attribute="weight")

node_names = w_graph.vs["name"]
max_scores = scores_matrix.max(axis=0).toarray()[0]
max_score_nodes = w_graph.vs[scores_matrix.argmax(axis=0).tolist()[0]]["name"]
max_score_nodes = np.where(max_scores == 0, np.nan, max_score_nodes)

output = pd.DataFrame({"node_name": node_names,
                       "max_score": max_scores,
                       "max_score_nodes": max_score_nodes})
print(output)

问题是argmax 将只返回一个索引，如果你想让所有节点名称都带有max_score，你可以像这样循环：

from scipy.sparse import find

max_score_nodes = []
for node_index in w_graph.vs.indices:
    max_score = max_scores[node_index]
    if max_score == 0:
        max_score_nodes.append(np.nan)
        continue

    max_score_node_ind = find(scores_matrix.getrow(node_index) == max_score)[1]
    max_score_node_names = w_graph.vs[max_score_node_ind.tolist()]["name"]
    max_score_nodes.append(max_score_node_names)

output["max_score_nodes"] = max_score_nodes
print(output)

【讨论】：