【发布时间】:2018-11-24 22:21:35
【问题描述】:
Pandas 新手,有一个我自己无法回答的问题。对于上下文,这是从防火墙输出的。它会生成数百万个数据包,我正在尝试将这些数据聚合到防火墙规则集中。我想出的最好方法是根据目标 IP 识别流量。
如果源/目标端口是短暂的,它们会发生变化,因此将它们聚合到同一行中很重要。这样我就可以确定规则集的端口范围。
RAW CSV:
dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",方向,动作,原因,计数 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2
数据框:
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed NaN 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed NaN 2
2 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1028 outbound allowed NaN 2
3 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed NaN 2
我将如何合并具有相同 dest_ip 的行?
代码:
df = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
index_cols = df.columns.tolist()
index_cols.remove('dest_ip')
df = df.groupby(index_cols, as_index=False)['dest_ip'].apply(list)
print(df)
预期输出:
Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025-1026,1028 outbound allowed nan 2
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed nan 2
我在网上找到的大多数示例都涉及连接两个数据框,而我只有一个。任何帮助,将不胜感激。提前致谢!
【问题讨论】:
-
听起来您正在寻找“groupby”操作而不是合并。查看文档:pandas.pydata.org/pandas-docs/stable/generated/…
-
请告诉我们minimal reproducible example。特别是,您的帖子中现在没有数据框...
标签: python pandas dataframe merge row