【问题标题】:Python Pandas Multi-index: keeping same length of level=1 with all level=0 indexesPython Pandas 多索引:在所有 level=0 索引中保持 level=1 的相同长度
【发布时间】:2018-06-25 23:10:50
【问题描述】:

我有一个带有多索引索引的 df_ver1。我想删除所有具有不同级别 [1] 长度然后为 2 的行。下面是我的数据框。

In [13]: df_ver1
Out[13]: 
key  nm         0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
foo one  1.075770 -0.109050  1.643563 -1.469388
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061
oof two  1.340309 -1.187678 -2.211372  0.380396

我的理想输出是

In [13]: df_ver1_fixed
Out[13]: 
key  nm         0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061

如您所见,我想删除只有 1 个级别 [1] 索引的行。换句话说,我需要在第二级有“一个”和“两个”索引。有没有一种pythonic方式来完成这一步?谢谢!

【问题讨论】:

    标签: python pandas indexing multi-index multi-level


    【解决方案1】:

    这也可以。您实际上可以通过多索引key进行分组,并过滤​​掉不等于2的组的长度。

    df.groupby(by='key').filter(lambda x: len(x) == 2) # keep groups with len 2
    

    正如@Zero 建议的那样,我们可以更具体地使用以下来指定满足要求的变量集set(['one', 'two'])

    df.groupby(by='key').filter(
                  lambda x: set(x.index.get_level_values('nm')) == set(['one', 'two']))
    
    key  nm         0         1         2         3
    bar one -0.424972  0.567020  0.276232 -1.087401
        two -0.673690  0.113648 -1.478427  0.524988
    baz one  0.404705  0.577046 -1.715002 -1.039268
        two -0.370647 -1.157892 -1.344312  0.844885
    qux one -1.294524  0.413738  0.276662 -0.472035
        two -0.013960 -0.362543 -0.006154 -0.923061
    

    另一种方法:使用多索引选择

    sz = df.groupby("key").size()
    indexes = sz[sz == 2].index.tolist()  # first-level indexes that we want.
    df.loc[indexes] # use loc for selection
    
    key  nm         0         1         2         3
    bar one -0.424972  0.567020  0.276232 -1.087401
        two -0.673690  0.113648 -1.478427  0.524988
    baz one  0.404705  0.577046 -1.715002 -1.039268
        two -0.370647 -1.157892 -1.344312  0.844885
    qux one -1.294524  0.413738  0.276662 -0.472035
        two -0.013960 -0.362543 -0.006154 -0.923061
    

    【讨论】:

    • 更具体地说,df.groupby(by='key').filter(lambda x: set(x.index.get_level_values('nm')) == set(['one', 'two']))
    【解决方案2】:

    我认为你需要:

    #filter only one and two values by second level
    df = df.loc[pd.IndexSlice[:, ['one','two']], :]
    #filter by length
    df = df[df.groupby(level=0)[df.columns[0]].transform('size') == 2]
    print (df)
                    0         1         2         3
    key nm                                         
    bar one -0.424972  0.567020  0.276232 -1.087401
        two -0.673690  0.113648 -1.478427  0.524988
    baz one  0.404705  0.577046 -1.715002 -1.039268
        two -0.370647 -1.157892 -1.344312  0.844885
    qux one -1.294524  0.413738  0.276662 -0.472035
        two -0.013960 -0.362543 -0.006154 -0.923061
    

    另一种解决方案是比较集合:

    mask = df.reset_index()
             .groupby('key')['nm']
             .transform(lambda x: set(x) == set(['one','two']))
             .values 
    df = df[mask]
    print (df)
                    0         1         2         3
    key nm                                         
    bar one -0.424972  0.567020  0.276232 -1.087401
        two -0.673690  0.113648 -1.478427  0.524988
    baz one  0.404705  0.577046 -1.715002 -1.039268
        two -0.370647 -1.157892 -1.344312  0.844885
    qux one -1.294524  0.413738  0.276662 -0.472035
        two -0.013960 -0.362543 -0.006154 -0.923061
    

    【讨论】:

      猜你喜欢
      • 2016-10-18
      • 1970-01-01
      • 1970-01-01
      • 2021-02-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-03-03
      • 2016-12-16
      相关资源
      最近更新 更多