【问题标题】:Counting unique values throws dimension error计算唯一值会引发尺寸错误
【发布时间】:2020-12-07 03:36:52
【问题描述】:

我输入了一个大于 6m 行的 pandas 数据框 new_res。我的目标是计算所有唯一行的数量。

start_hex_id_res8   start_hex_id_res9   end_hex_id_res9 end_hex_id_res9 date    is_weekday  is_holiday  starthour
0   882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01  True    False   0
1   882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01  True    False   0
2   882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01  True    False   0
3   882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01  True    False   0
4   882a100d09fffff 892a100d097ffff 892a100d09bffff 892a100d09bffff 2020-07-01  True    False   0

start_hex_id_res8    object
start_hex_id_res9    object
end_hex_id_res9      object
end_hex_id_res9      object
date                 object
is_weekday             bool
is_holiday             bool
starthour             int64

我试过了

agg = new_res.groupby(['start_hex_id_res8', 'start_hex_id_res9', 'end_hex_id_res9', 'end_hex_id_res9', 'date','is_weekday', 'is_holiday', 'starthour']).size().groupby(level=0).size()

但这会引发错误:

ValueError: Grouper for 'end_hex_id_res9' not 1-dimensional

我应该如何解释这一点以及在 pandas 中创建一个新数据框的正确方法是什么,该数据框是new_res 的压缩版本?输出将只是一个具有相同列名的数据框,但包含所有唯一行的计数(在末尾添加 count 列)。

【问题讨论】:

  • 为什么会有两列同名“end_hex_id_res9”?

标签: python pandas pandas-groupby unique


【解决方案1】:

让我们试试;

g=df.apply(lambda x:x.astype(str))#Make entire dataframe a str
g.groupby(list(g.columns)).ngroup().nunique()#Groupbycolumns, find special groups and see how many are unique

【讨论】:

  • 请给我点赞。我赞成你的问题。谢谢
猜你喜欢
  • 2020-12-25
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2010-09-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多