熊猫按周分组答案

【问题标题】：Pandas grouping by week熊猫按周分组
【发布时间】：2022-01-23 17:41:14
【问题描述】：

我在 pandas 中有一个这样的数据框：

Name  Date
A     9/1/21
B     10/20/21
C     9/8/21
D     9/20/21
K     9/29/21
K     9/15/21
M     10/1/21
C     9/12/21
D     9/9/21
C     9/9/21
R     9/20/21

我需要按周计算项目数。

weeks = [9/6/21, 9/13, 9/20/21, 9/27/21, 10/4/21]

示例：从 9/6 到 9/13，输出应该是：

Name  Weekly count
A     0
B     0
C     3
D     1
M     0
K     0
R     0

同样，我需要找出这些间隔的计数：9/13 到 9/20、9/20 到 9/27 以及 9/27 到 10/4。谢谢！

【问题讨论】：

我不明白你的星期分割：2021-09-13 是星期一

标签： python pandas dataframe

【解决方案1】：

可能需要注意一周的第一天的定义，您可以在以下代码中采取一些措施。

df = pd.DataFrame(data=d)
df['Date']=pd.to_datetime(df['Date'])

我。不连续索引

星期一被选为一周的第一天

#(1) Build a series of first_day_of_week, monday is chosen as the first day of week
weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D') 

#(2) Groupby and some tidying
df2 = ( df.groupby([df['Name'], weeks_index])
          .count()
          .rename(columns={'Date':'Count'})
        
          .swaplevel()   # weeks to first level
          .sort_index() 
          .unstack(1).fillna(0.0)
        
          .astype(int)
          .rename_axis('first_day_of_week')
      )

>>> print(df2)
Name                  A  B  C  D  K  M  R
first_day_of_week                        
2021-08-30            1  0  0  0  0  0  0
2021-09-06            0  0  3  1  0  0  0
2021-09-13            0  0  0  0  1  0  0
2021-09-20            0  0  0  1  0  0  1
2021-09-27            0  0  0  0  1  1  0
2021-10-18            0  1  0  0  0  0  0

二。连续索引

这部分和上一个没有太大区别。

我们构建一个连续版本的索引用于重新索引

星期一被选为一周的第一天（显然对于两个索引）

#(1a) Build a series of first_day_of_week, monday is chosen as the 
weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D')
#(1b) Build a continuous series of first_day_of_week
continuous_weeks_index = pd.date_range(start=weeks_index.min(), 
                                 end=weeks_index.max(),
                                 freq='W-MON')    # monday

#(2) Groupby, unstack, reindex, and some tidying
df2 = ( df
          # groupby and count
          .groupby([df['Name'], weeks_index])
          .count()
          .rename(columns={'Date':'Count'})
        
          # unstack on weeks 
          .swaplevel()    # weeks to first level
          .sort_index()
          .unstack(1)

          # reindex to insert weeks with no data
          .reindex(continuous_weeks_index)  # new index
        
          # clean up
          .fillna(0.0)               
          .astype(int)
          .rename_axis('first_day_of_week')
      )

>>>print(df2)
Name               A  B  C  D  K  M  R
first_day_of_week                     
2021-08-30         1  0  0  0  0  0  0
2021-09-06         0  0  3  1  0  0  0
2021-09-13         0  0  0  0  1  0  0
2021-09-20         0  0  0  1  0  0  1
2021-09-27         0  0  0  0  1  1  0
2021-10-04         0  0  0  0  0  0  0
2021-10-11         0  0  0  0  0  0  0
2021-10-18         0  1  0  0  0  0  0

最后一步（如果需要）

df2.stack()

【讨论】：

谢谢。很有帮助。
我注意到，如果一周内所有名称的计数均为 0，则该行条目不会显示在决赛表中。我们能防止这种情况发生吗？例如，如果我们在 9/13 周全为零，我仍然希望显示该行。
@Mark，我添加了一个连续索引的解决方案（以周为单位）
谢谢。你是专业人士！
非常感谢。还有一个问题。在上面的输出中，如果我们在 8 月 30 日到 9 月 6 日这周之间没有任何内容，则不会显示 8/30 的条目。我想给我的程序一个日期范围。示例：如果我的开始日期是 9 月 1 日，并且 8 月 30 日这一周没有任何内容，我仍然希望看到 2021 年 8 月 30 日的零。同样的情况也适用于结束日期。如果我的结束日期是 10 月 27 日并且 10 月 25 日这一周没有任何内容，我仍然希望看到 10 月 25 日这一周的 0 个条目。