【问题标题】:Conditional grouped CumCount pandas条件分组 CumCount 熊猫
【发布时间】:2018-09-10 02:31:41
【问题描述】:

我有这个数据框:

    dic = {'users' : ['A','A','B','A','A','B','A','A','A','A','A','B','A'],
            'product' : [1,1,2,2,1,2,1,2,1,1,2,1,1],
            'action' : ['see', 'see', 'see', 'see', 'buy', 'buy', 'see', 'see', 'see', 'see', 'buy', 'buy', 'buy']
    }

df = pd.DataFrame(dic, columns=dic.keys())

df


users   product action
0   A   1   see
1   A   1   see
2   B   2   see
3   A   2   see
4   A   1   buy
5   B   2   buy
6   A   1   see
7   A   2   see
8   A   1   see
9   A   1   see
10  A   2   buy
11  B   1   buy
12  A   1   buy

我需要一个列来计算每个用户在购买产品之前看到了多少次。

结果应该是这样的:

dic = {'users' : ['A','A','B','A','A','B','A','A','A','A','A','B','A'],
        'product' : [1,1,2,2,1,2,1,2,1,1,2,1,1],
        'action' : ['see', 'see', 'see', 'see', 'buy', 'buy', 'see', 'see', 'see', 'see', 'buy', 'buy', 'buy'],
        'see_before_buy' : [1,2,1,1,2,1,1,2,2,3,2,0,3]
}

users   product action  see_before_buy
0   A   1   see 1
1   A   1   see 2
2   B   2   see 1
3   A   2   see 1
4   A   1   buy 2
5   B   2   buy 1
6   A   1   see 1
7   A   2   see 2
8   A   1   see 2
9   A   1   see 3
10  A   2   buy 2
11  B   1   buy 0
12  A   1   buy 3

有人可以帮我吗?

【问题讨论】:

    标签: python pandas group-by pandas-groupby cumsum


    【解决方案1】:

    您可能需要在shfit 之后使用cumsumgroupby 创建一个附加密钥

    addkey=df.groupby(['user','#product']).action.apply(lambda x : x.eq('buy').shift().fillna(0).cumsum())
    df['seebefore']=df.action.eq('see').groupby([df.user,df['#product'],addkey]).cumsum()
    df
    Out[131]: 
        index user  #product action  seebefore
    0       0    A         1    see        1.0
    1       1    A         1    see        2.0
    2       2    B         2    see        1.0
    3       3    A         2    see        1.0
    4       4    A         1    buy        2.0
    5       5    B         2    buy        1.0
    6       6    A         1    see        1.0
    7       7    A         2    see        2.0
    8       8    A         1    see        2.0
    9       9    A         1    see        3.0
    10     10    A         2    buy        2.0
    11     11    B         1    buy        0.0
    12     12    A         1    buy        3.0
    

    【讨论】:

    【解决方案2】:

    一种方法是:

    首先获取所有用户和产品

    users=list(df.users.unique())
    products=list(df.products.unique())
    

    为用户产品组合创建一个字典,记录每个用户看过的产品

    see_dict={users[i]:{products[j]:0 for j in range(len(products))} for i in range(len(users))}
    
    #{'A': {1: 0, 2: 0}, 'B': {1: 0, 2: 0}}
    

    初始化空列

    df["see_before_buy"]=None
    

    现在对于每一行,如果是查看操作,请更新字典(增量)并分配值。如果是买入动作,只需赋值并重置计数器

    for i in range(len(df)):
        user=df.loc[i,"users"]
        product=df.loc[i,"products"]
        if(df.loc[i,"action"]=="see"): #if the action is see
            see_dict[user][product]+=1 #increment the see dictionary
            df.loc[i,"see_before_buy"]=see_dict[user][product] #assign this value for this row
        else: #buy action
            df.loc[i,"see_before_buy"]=see_dict[user][product] #assign the current value
            see_dict[user][product]=0 #reset the counter
    

    输出

       users  products action  see_before_buy
    0      A         1    see               1
    1      A         1    see               2
    2      B         2    see               1
    3      A         2    see               1
    4      A         1    buy               2
    5      B         2    buy               1
    6      A         1    see               1
    7      A         2    see               2
    8      A         1    see               2
    9      A         1    see               3
    10     A         2    buy               2
    11     B         1    buy               0
    12     A         1    buy               3
    

    【讨论】:

      猜你喜欢
      • 2020-01-25
      • 2017-01-30
      • 1970-01-01
      • 2023-01-03
      • 1970-01-01
      • 2019-03-06
      • 2021-12-13
      • 2019-02-15
      • 2019-05-15
      相关资源
      最近更新 更多