【发布时间】:2020-12-06 13:07:13
【问题描述】:
我目前正在处理 twitter 用户图,其中有 2 个 csv 文件,一个是具有接近 147,000 个节点的节点列表,另一个是包含用户之间所有关系的边列表。
当我将文件导入 networkx 并在图表上使用 info() 方法时,它告诉我图表中有超过 5,000,000 个节点(如果我在有向和无向版本上都使用info(),则该图相似图)
我已经尝试过使用较小的数据集,并且节点数量与我的节点列表文件中的数量相匹配。有谁知道为什么会发生这种情况?
非常感谢
编辑
我正在使用的代码如下所示
import csv
import networkx as nx
import pandas as pd
with open('node list.csv', 'r') as nodecsv: # Open the file
nodereader = csv.reader(nodecsv) # Read the csv
# Retrieve the data (using Python list comprhension and list slicing to remove the header row, see footnote 3)
nodes = [n for n in nodereader][1:]
node_names = [n[0] for n in nodes] # Get a list of only the node names
with open('edge list.csv', 'r') as edgecsv: # Open the file
edgereader = csv.reader(edgecsv) # Read the csv
edges = [tuple(e) for e in edgereader][1:] # Retrieve the data
print(len(node_names))
print(len(edges))
G = nx.Graph()
# G.add_nodes_from(node_names)
G.add_edges_from(edges)
print(nx.info(G))
print(total_nodes)
follower_count_dict = {}
friend_count_dict = {}
staus_count_dict = {}
created_at_dict = {}
for node in nodes: # Loop through the list, one row at a time
follower_count_dict[node[0]] = node[1]
friend_count_dict[node[0]] = node[2]
staus_count_dict[node[0]] = node[3]
created_at_dict[node[0]] = node[4]
#print( user_followers_count_dict)
nx.set_node_attributes(G, follower_count_dict, 'follower_count')
nx.set_node_attributes(G, friend_count_dict, 'friend_count')
nx.set_node_attributes(G, staus_count_dict, 'staus_count')
nx.set_node_attributes(G, created_at_dict, 'created_at')
DG = nx.DiGraph()
DG.add_nodes_from(node_names)
DG.add_edges_from(edges)
nx.set_node_attributes(DG, follower_count_dict, 'follower_count')
nx.set_node_attributes(DG, friend_count_dict, 'friend_count')
nx.set_node_attributes(DG, staus_count_dict, 'staus_count')
nx.set_node_attributes(DG, created_at_dict, 'created_at')
用户列表文件快照
边缘列表文件的快照
【问题讨论】:
-
为了获得帮助,您应该展示一个小示例,说明这 2 个文件的外观以及您用于导入它们的代码
-
嗨@abc 感谢您的回复,我已经编辑了原帖