如何使用 pandas 创建一个列来存储 group-by 的首次出现次数？答案

【问题标题】：How to use pandas to create a column that stores count of first occurrences on a group-by?如何使用 pandas 创建一个列来存储 group-by 的首次出现次数？
【发布时间】：2020-12-03 17:24:30
【问题描述】：

第一季度。给定数据框 1，我正在尝试按唯一的新事件和另一列来分组，该列为我提供每月现有的 ID 计数

ID     Date
1    Jan-2020
2    Feb-2020
3    Feb-2020
1    Mar-2020
2    Mar-2020
3    Mar-2020
4    Apr-2020
5    Apr-2020

预期输出新添加的唯一 ID 值和现有 ID 值总和

Date       ID_Count   Existing_count
Jan-2020      1           0
Feb-2020      2           1  
Mar-2020      0           3
Apr-2020      2           3

注意：2020 年 3 月 ID_Count 为零，因为 ID 1、2 和 3 在前几个月存在。

注意：2020 年 1 月的现有计数为 0，因为 1 月之前的 ID 为零。2020 年 2 月的现有计数为 1，因为 2 月之前只有 1 个。2020 年 3 月有 3 个现有计数，因为它添加了 Jan +二月等

【问题讨论】：

标签： python python-3.x pandas dataframe pandas-groupby

【解决方案1】：

我认为你可以这样做：

df['month'] = pd.to_datetime(df['Date'], format='%b-%Y')

# Find new IDs
df['new'] = df.groupby('ID').cumcount()==0

# Count new IDs by month
df_ct = df.groupby('month')['new'].sum().to_frame(name='ID_Count')

# Count all previous new IDs
df_ct['Existing_cnt'] = df_ct['ID_Count'].shift().cumsum().fillna(0).astype(int) 
df_ct.index = df_ct.index.strftime('%b-%Y')
df_ct

输出：

          ID_Count  Existing_cnt
month                           
Jan-2020         1             0
Feb-2020         2             1
Mar-2020         0             3
Apr-2020         2             3

【讨论】：

工作就像一个魅力。我得到了现有的计数，但由于第一步而被卡住了。感谢您的帮助！