python根据另一列中的时间分组和计数答案

【问题标题】：python group by and count based on time in another columnpython根据另一列中的时间分组和计数
【发布时间】：2021-06-30 01:50:32
【问题描述】：

我正在尝试使用 python 进行分组并计算符合特定条件的记录数。

示例数据如下所示。我想创建一个新列“phone_cnt”来显示符合以下条件的呼叫数：首先，找到至少有一个 dept=0 记录的号码；然后从 AFTER dept=0 调用

发生的次数中计算调用次数


    np.random.seed(0)
    # create an array of 17 dates starting at '2015-02-24', one per hour
    rng = pd.date_range('2021-04-01', periods=17, freq='H')
    df = pd.DataFrame({ 'time': rng, 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
                      'phone':[881,453,453,111,347,767,767,980,767,453,453,767,767,687,321,243,243],
                     'dept': [1,0,1,1,1,1,0,0,0,0,1,1,1,1,1,0,1]}) 
    df

预期结果： phone 243 has phone_cnt=1; 453 has 3 counts, 767 has 3 counts, and 980 has 0 count

我已尝试以下步骤。前 2 步有效，但第 3 步是错误的。


    # step 1: create a list of unique phone numbers which have dept=0 in records
    phonelist = df[df['dept']==0].phone.unique()
         
    # step 2: find all the calls from the above calls
    df1 = df[df['phone'].isin(phonelist)].sort_values(by = ['phone','time'], ascending = [True, True])
    df1
        
    # step 3: count the number of calls in df1 that happened after the dept=0 call for each number
    df2 =df1.groupby('phone')['time'].apply(lambda x: x>df[df['dept']==0].time).sum()).reset_index(name='count')

谁能帮帮我？谢谢！！

【问题讨论】：

标签： python datetime group-by count apply

【解决方案1】：

这是您在df1 使用itertools.dropwhile 中断的方式：

from itertools import dropwhile

is_nonzero = lambda x: x != 0
df1.groupby("phone").dept.apply(lambda gr: len(list(dropwhile(is_nonzero, gr))) - 1)

给予

phone
243    1
453    3
767    3
980    0
Name: dept, dtype: int64

dropwhile 在其谓词（即本例中的非零性）成立时删除值。这样我们得到一个裁剪组，其中只有第一个 0 和其余元素存在。现在我们需要这些家伙的“长度减 1”。然而，由于dropwhile 返回一个“惰性”对象，我们首先调用list，然后调用len。（末尾的-1 是因为所需的值在第一个0之后。）

【讨论】：