【发布时间】:2020-12-07 18:00:30
【问题描述】:
试图找出出勤率的趋势。我将现有的 df 过滤为此,因此我可以一次查看 1 个活动。
+---+-----------+-------+----------+-------+---------+
| | Date | Org | Activity | Hours | Weekday |
+---+-----------+-------+----------+-------+---------+
| 0 | 8/3/2020 | Org 1 | Gen Ab | 10.5 | Monday |
| 1 | 8/25/2020 | Org 1 | Gen Ab | 2 | Tuesday |
| 3 | 8/31/2020 | Org 1 | Gen Ab | 8.5 | Monday |
| 7 | 8/10/2020 | Org 2 | Gen Ab | 1 | Monday |
| 8 | 8/14/2020 | Org 3 | Gen Ab | 3.5 | Friday |
+---+-----------+-------+----------+-------+---------+
这段代码:
gen_ab = att_df.loc[att_df['Activity'] == "Gen Ab"]
sum_gen_ab = gen_ab.groupby(['Date', 'Activity']).sum()
sum_gen_ab.head()
返回这个:
+------------+----------+------------+
| | | Hours |
+------------+----------+------------+
| Date | Activity | |
| 06/01/2020 | Gen Ab | 347.250000 |
| 06/02/2020 | Gen Ab | 286.266667 |
| 06/03/2020 | Gen Ab | 169.583333 |
| 06/04/2020 | Gen Ab | 312.633333 |
| 06/05/2020 | Gen Ab | 317.566667 |
+------------+----------+------------+
如何使总和列名称为“小时”?当我这样做时,我仍然得到相同的结果:
sum_gen_ab['Hours'] = gen_ab.groupby(['Date', 'Activity']).sum()
我最终想要做的是有一个折线图,它显示了一段时间内活动的总小时数。时间当然是我的 df 中的日期。
plt.plot(sum_gen_ab['Date'], sum_gen_ab['Hours'])
plt.show()
返回 KeyError:日期
【问题讨论】:
标签: python pandas dataframe matplotlib