Networkx 图在我的模型文件中具有更多节点答案

【问题标题】：Networkx graph has more nodes tha in my modelist fileNetworkx 图在我的模型文件中具有更多节点
【发布时间】：2020-12-06 13:07:13
【问题描述】：

我目前正在处理 twitter 用户图，其中有 2 个 csv 文件，一个是具有接近 147,000 个节点的节点列表，另一个是包含用户之间所有关系的边列表。

当我将文件导入 networkx 并在图表上使用 info() 方法时，它告诉我图表中有超过 5,000,000 个节点（如果我在有向和无向版本上都使用info()，则该图相似图）

我已经尝试过使用较小的数据集，并且节点数量与我的节点列表文件中的数量相匹配。有谁知道为什么会发生这种情况？

非常感谢

编辑

我正在使用的代码如下所示

import csv

import networkx as nx

import pandas as pd

with open('node list.csv', 'r') as nodecsv: # Open the file
    nodereader = csv.reader(nodecsv) # Read the csv
    # Retrieve the data (using Python list comprhension and list slicing to remove the header row, see footnote 3)
    nodes = [n for n in nodereader][1:]

node_names = [n[0] for n in nodes] # Get a list of only the node names

with open('edge list.csv', 'r') as edgecsv: # Open the file
    edgereader = csv.reader(edgecsv) # Read the csv
    edges = [tuple(e) for e in edgereader][1:] # Retrieve the data
    
print(len(node_names))

print(len(edges))

G = nx.Graph()

# G.add_nodes_from(node_names)
G.add_edges_from(edges)


print(nx.info(G))

print(total_nodes)

follower_count_dict = {}
friend_count_dict = {}
staus_count_dict = {}
created_at_dict = {}






for node in nodes: # Loop through the list, one row at a time
    follower_count_dict[node[0]] = node[1]
    friend_count_dict[node[0]] = node[2]
    staus_count_dict[node[0]] = node[3]
    created_at_dict[node[0]] = node[4]

    

#print(  user_followers_count_dict)

nx.set_node_attributes(G, follower_count_dict, 'follower_count')
nx.set_node_attributes(G, friend_count_dict, 'friend_count')
nx.set_node_attributes(G, staus_count_dict, 'staus_count')
nx.set_node_attributes(G, created_at_dict, 'created_at')



DG = nx.DiGraph()
DG.add_nodes_from(node_names)
DG.add_edges_from(edges)


nx.set_node_attributes(DG, follower_count_dict, 'follower_count')
nx.set_node_attributes(DG, friend_count_dict, 'friend_count')
nx.set_node_attributes(DG, staus_count_dict, 'staus_count')
nx.set_node_attributes(DG, created_at_dict, 'created_at')

用户列表文件快照

边缘列表文件的快照

【问题讨论】：

为了获得帮助，您应该展示一个小示例，说明这 2 个文件的外观以及您用于导入它们的代码
嗨@abc 感谢您的回复，我已经编辑了原帖

标签： python twitter networkx

【解决方案1】：

您的边缘列表包括未出现在您的节点列表中的节点。因此，当添加这些边时，networkx 也会添加节点。

造成这种情况的原因可能包括节点被视为具有不同空格的字符串（可能在结尾处是 '\n'），或者在某些情况下节点被视为整数而在其他情况下被视为字符串。

解决这个问题的一种方法是，在添加边之前，执行一个循环来检查每个节点是否在图中，如果不是，则打印出该节点：

for edge in edges:
    for node in edge:
        if node not in G:
            print(node)

【讨论】：

感谢@Joel 的回复。这是有道理的，我必须对边缘列表文件进行一些预处理，并且可能跳过了一些步骤。我再看看，试试你的解决方案
快速评论 - 我的“解决方案”本身并不能解决它，但它会帮助您确定问题所在。