【发布时间】:2021-02-06 02:39:42
【问题描述】:
您好,数据科学家和 Pandas 专家,
我需要一些帮助,因为我无法正确组织我的数据。
在 groupby 中使用 unstack 时,它不会正确分组数据。 这是我的数据框:
data = [
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'aemp', 'Department': 'dep1'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'aemp', 'Department': 'dep1'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'bemp', 'Department': 'dep1'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'bemp', 'Department': 'dep1'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'cemp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'cemp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store1', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'eemp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'eemp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'femp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'eemp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'femp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'femp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'aemp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'aemp', 'Department': 'dep1'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'gemp', 'Department': 'dep2'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-05 00:00:00'), 'Employee': 'gemp', 'Department': 'dep2'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'gemp', 'Department': 'dep2'},\
{'Store': 'Store2', 'Date': pd.Timestamp('2020-08-09 00:00:00'), 'Employee': 'cemp', 'Department': 'dep2'},\
{'Store': 'Store3', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'eemp', 'Department': 'dep1'},\
{'Store': 'Store3', 'Date': pd.Timestamp('2020-08-05 00:00:00'), 'Employee': 'eemp', 'Department': 'dep1'},\
{'Store': 'Store3', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'bemp', 'Department': 'dep1'},\
{'Store': 'Store3', 'Date': pd.Timestamp('2020-08-05 00:00:00'), 'Employee': 'bemp', 'Department': 'dep1'},\
{'Store': 'Store3', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'bemp', 'Department': 'dep1'},\
{'Store': 'Store3', 'Date': pd.Timestamp('2020-08-07 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'},\
{'Store': 'Store3', 'Date': pd.Timestamp('2020-08-01 00:00:00'), 'Employee': 'demp', 'Department': 'dep2'}]
df = pd.DataFrame(data)
我想按如下方式组织我的输出:
Store Store1 Store2 Store3
Department dep1 dep2 dep1 dep2 dep1 dep2
Employee aemp bemp cemp demp aemp eemp femp cemp demp gemp bemp eemp demp
Date
2020-08-03 1.0 1.0 2.0 3.0 1.0 1.0 2.0 0.0 1.0 1.0 2.0 1.0 1.0
2020-08-10 1.0 1.0 0.0 4.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0
我使用了以下 groupby 表达式(我不知道如何按级别对框架进行排序):
df = df.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Store', 'Department', 'Employee'])\
.size().unstack(['Store', 'Department', 'Employee']).fillna(0)
这是我使用上面的 groupby 表达式时得到的结果:
Store Store1 Store2 Store3 Store2
Department dep1 dep2 dep1 dep2 dep1 dep2 dep2
Employee aemp bemp cemp demp aemp eemp femp demp gemp bemp eemp demp cemp
Date
2020-08-03 1.0 1.0 2.0 3.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0.0
2020-08-10 1.0 1.0 0.0 4.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0
请向我提供您的专家帮助,帮助我解决和修复我的输出,以便所有内容都正确分组。
谢谢你,非常感谢你的帮助。
这是我之前博客的延续:How to show only column with Values in Pandas Groupby
【问题讨论】:
标签: python pandas pandas-groupby