使用熊猫对重复的列进行分组并求和相应的列值[重复]答案

【问题标题】：Group duplicate columns and sum the corresponding column values using pandas [duplicate]使用熊猫对重复的列进行分组并求和相应的列值[重复]
【发布时间】：2017-12-29 21:33:52
【问题描述】：

我正在预处理 apache 服务器日志数据。我有 3 列 ID、TIME 和 BYTES。示例：

ID &nbsp &nbsp 时间 &nbsp &nbsp 字节

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 30

3 &nbsp &nbsp 13:03 &nbsp &nbsp 40

4 &nbsp &nbsp 13:02 &nbsp &nbsp 50

5 &nbsp &nbsp 13:03 &nbsp &nbsp 70

我想实现这样的目标：

ID &nbsp &nbsp 时间 &nbsp &nbsp 字节

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 80

3 &nbsp &nbsp 13:03 &nbsp &nbsp 110

【问题讨论】：

df.groupby('TIME', as_index=False).agg({'ID': 'min', 'BYTES': 'sum'}) 可以。
确实如此。 @Zero，你能找到骗子吗？
它搞砸了时间。现在时间以某种奇怪的模式从 0:00 开始。就我而言，ID并不重要。所以，这只是时间和字节。我希望输出像我展示的那样（因为我会根据时间绘制图表）。如图所示，输出必须按 TIME 顺序排列。 @Zero，你有什么建议？
TIME的dtype是什么？
df.groupby('TIME')[['BYTES']].sum().plot()?

标签： python-3.x pandas pandas-groupby data-scrubbing

【解决方案1】：

让我们试试吧：

df['TIME'] = pd.to_datetime(df['TIME'])
ax = df.groupby('TIME')['BYTES'].sum().plot()
ax.set_xlim('13:00:00','13:03:00')

输出：

【讨论】：