【问题标题】:How to count occurrences of >=3 consecutive 1 values in dataframe column如何计算数据框列中 >=3 个连续 1 值的出现次数
【发布时间】:2021-02-22 09:55:21
【问题描述】:

pd.DataFrame 我有一列A,我想计算多少次值1 连续出现 3 次或以上。

df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]}) 

输出:

df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]}) 

【问题讨论】:

    标签: python-3.x pandas numpy pandas-groupby


    【解决方案1】:

    你需要在这里像itertools.groupby一样执行groupby,然后取值为1的组,因为我们连续计数为1。然后使用GroupBy.count,取值大于等于3

    g = df['A'].ne(df['A'].shift()).cumsum()
    g = g[df['A'].eq(1)]
    g.groupby(g).count().ge(3).sum()
    # 2
    

    【讨论】:

      【解决方案2】:

      首先按1 过滤连续组,因此得到连续的1 组,然后添加Series.value_counts,通过Series.ge 比较大或等于并通过sum 计数Trues:

      a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
      print (a)
      2
      

      Numpy 替代方案 - 比较 >= 3sum 的连续计数:

      condition = df.A.eq(1).to_numpy()
      #https://stackoverflow.com/a/24343375
      a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
                                           condition[:-1] != condition[1:],
                                           [True])))[0])[::2] >= 3)
      print (a)
      2
      

      【讨论】:

        【解决方案3】:

        TLDR

        In [1]: print(((df['A'] != 1).cumsum().loc[df['A'] == 1].value_counts() >= 3).sum())
        2
        

        说明

        In [1]: import pandas as pd
        
        In [2]: df = pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]})
        

        下面要给每组连续的1分配一个唯一的ID...

        In [3]: df['cumsum'] = (df['A'] != 1).cumsum()
        
        In [4]: print(df)
            A  cumsum
        0   0       1
        1   0       2
        2   1       2
        3   0       3
        4   1       3
        5   0       4
        6   0       5
        7   0       6
        8   0       7
        9   1       7
        10  1       7
        11  1       7
        12  1       7
        13  0       8
        14  1       8
        15  1       8
        16  1       8
        

        ...只要你只保留'A'等于1的行来清除

        In [5]: df = df[df['A'] == 1]
        
        In [6]: print(df)
            A  cumsum
        2   1       2
        4   1       3
        9   1       7
        10  1       7
        11  1       7
        12  1       7
        14  1       8
        15  1       8
        16  1       8
        

        然后,您可以使用value_counts()groupby()

        # With value_counts()
        
        In [7]: print(df['cumsum'].value_counts())
        7    4
        8    3
        3    1
        2    1
        Name: cumsum, dtype: int64
        
        # The amount of sets of at least 3 consecutive 1 is:
        In [8]: print((df['cumsum'].value_counts() >= 3).sum())
        2
        
        
        
        # With groupby()
        In [9]: list(df.groupby('cumsum'))
        Out[10]: 
        [(2,
             A  cumsum
          2  1       2),
         (3,
             A  cumsum
          4  1       3),
         (7,
              A  cumsum
          9   1       7
          10  1       7
          11  1       7
          12  1       7),
         (8,
              A  cumsum
          14  1       8
          15  1       8
          16  1       8)]
        
        # The amount of sets of at least 3 consecutive 1 is:
        In [10]: print(len([dataframe for _, dataframe in df.groupby('cumsum') if len(dataframe) >= 3]))
        2
        

        【讨论】:

          猜你喜欢
          • 2019-09-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多