【发布时间】:2020-12-07 03:36:52
【问题描述】:
我输入了一个大于 6m 行的 pandas 数据框 new_res。我的目标是计算所有唯一行的数量。
start_hex_id_res8 start_hex_id_res9 end_hex_id_res9 end_hex_id_res9 date is_weekday is_holiday starthour
0 882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01 True False 0
1 882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01 True False 0
2 882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01 True False 0
3 882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01 True False 0
4 882a100d09fffff 892a100d097ffff 892a100d09bffff 892a100d09bffff 2020-07-01 True False 0
start_hex_id_res8 object
start_hex_id_res9 object
end_hex_id_res9 object
end_hex_id_res9 object
date object
is_weekday bool
is_holiday bool
starthour int64
我试过了
agg = new_res.groupby(['start_hex_id_res8', 'start_hex_id_res9', 'end_hex_id_res9', 'end_hex_id_res9', 'date','is_weekday', 'is_holiday', 'starthour']).size().groupby(level=0).size()
但这会引发错误:
ValueError: Grouper for 'end_hex_id_res9' not 1-dimensional
我应该如何解释这一点以及在 pandas 中创建一个新数据框的正确方法是什么,该数据框是new_res 的压缩版本?输出将只是一个具有相同列名的数据框,但包含所有唯一行的计数(在末尾添加 count 列)。
【问题讨论】:
-
为什么会有两列同名“end_hex_id_res9”?
标签: python pandas pandas-groupby unique