【问题标题】:Drop All Rows After First Occurrence of Column Value在第一次出现列值后删除所有行
【发布时间】:2021-12-09 13:49:09
【问题描述】:

我的问题和这个一样,有一个额外的限制,我不知道如何解决:

Python PANDAS: Drop All Rows After First Occurrence of Column Value

在这篇文章中,问题是在第一次打开时删除所有行:

开始:

Rank Status
1    Closed
5    Closed
6    Open
9    Closed
10   Open

结果:

Rank Status
 1    Closed
 5    Closed
 6    Open

这是问题的最佳答案:

df = df.sort('Rank').reset_index()
df.loc[: df[(df['Status'] == 'Open')].index[0], :]

我有同样的问题,但我在同一个数据框中有多个商店,并希望为所有商店计算:

开始:

Shop Rank  Status
A    1     Closed
A    5     Closed
A    6     Open
A    9     Closed
A    10    Open
A    1     Closed
B    3     Closed
B    8     Closed
B    12    Open
B    15    Closed
...

我想要的结果:

Shop Rank  Status
A    1     Closed
A    5     Closed
A    6     Open
B    3     Closed
B    8     Closed
B    12    Open
...

我应该如何修改过去的答案以使其同时适应我的所有商店?

提前致谢

【问题讨论】:

    标签: python python-3.x pandas


    【解决方案1】:

    想法是通过比较OpenSeries.cummax 来处理每个组的列,因为还需要首先Open 是必要的转变:

    mask = (df['Status'].eq('Open')
                        .groupby(df['Shop'])
                        .transform(lambda x: x.shift(fill_value=False).cummax()))
    
    df = df[~mask]
    print (df)
      Shop  Rank  Status
    0    A     1  Closed
    1    A     5  Closed
    2    A     6    Open
    6    B     3  Closed
    7    B     8  Closed
    8    B    12    Open
    

    如果总是Open 每组可以使用:

    def f(x):
        
        return x.loc[: x[(x['Status'] == 'Open')].index[0], :]
    
    df = df.groupby('Shop', group_keys=False).apply(f)
    

    def f(x):
        
        return x.loc[: (x['Status'] == 'Open').idxmax()]
    
    df = df.groupby('Shop', group_keys=False).apply(f)
    

    编辑:

    使用一般数据进行测试:

    print (df)
       Shop  Rank  Status
    0     A     1  Closed
    1     A     5  Closed
    2     A     6    Open
    3     A     9  Closed
    4     A    10    Open
    5     A     1  Closed
    6     B     3  Closed
    7     B     8  Closed
    8     B    12    Open
    9     B    15  Closed
    10    C     8  Closed <- in C group no Open
    11    C    12  Closed
    12    D    15    Open <- in D group first Open
    13    D    15  Closed
    

    此解决方案失败,因为C 组中没有Open

    def f(x):
        
        return x.loc[: x[(x['Status'] == 'Open')].index[0], :]
    
    df = df.groupby('Shop', group_keys=False).apply(f)
    

    IndexError:索引 0 超出轴 0 的范围,大小为 0


    def f(x):
        
        return x.loc[: (x['Status'] == 'Open').idxmax()]
    
    df = df.groupby('Shop', group_keys=False).apply(f)
    
    print (df)
       Shop  Rank  Status
    0     A     1  Closed
    1     A     5  Closed
    2     A     6    Open
    6     B     3  Closed
    7     B     8  Closed
    8     B    12    Open
    10    C     8  Closed <- incorrect return first False row
    12    D    15    Open
    

    第一个解决方案运行良好:

    mask = (df['Status'].eq('Open')
                        .groupby(df['Shop'])
                        .transform(lambda x: x.shift(fill_value=False).cummax()))
    
    df1 = df[~mask]
    print (df1)
       Shop  Rank  Status
    0     A     1  Closed
    1     A     5  Closed
    2     A     6    Open
    6     B     3  Closed
    7     B     8  Closed
    8     B    12    Open
    10    C     8  Closed
    11    C    12  Closed
    12    D    15    Open
    

    【讨论】:

    • 它非常适合我,非常感谢。但是,我不明白您所说的“如果可以始终按组开放”是什么意思
    • @Adept - 当然,给我一些时间,补充回答。
    • 我明白了。我的情况是,组中可能没有任何 Open,第一个解决方案效果很好。
    • @Adept - 一个问题 - 如果没有 Open 是否需要删除组 C
    • 不,我想保留 C 的所有行,这在您的第一个解决方案中已正确完成
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-08-29
    • 1970-01-01
    • 2016-08-25
    • 1970-01-01
    • 2017-12-14
    相关资源
    最近更新 更多