【问题标题】:Pandas Set Top Row as MultiIndex Level 1Pandas 将第一行设置为 MultiIndex Level 1
【发布时间】:2016-09-18 09:57:56
【问题描述】:

给定以下数据框:

d2=pd.DataFrame({'Item':['items','y','z','x'],
                'other':['others','bb','cc','dd']})
d2
    Item    other
0   items   others
1     y     bb
2     z     cc
3     x     dd

我想创建一个多索引标题集,使当前标题变为 0 级,当前顶部行变为 1 级。

提前致谢!

【问题讨论】:

    标签: python-3.x pandas multi-index


    【解决方案1】:

    另一种解决方案是创建MultiIndex.from_tuples:

    cols = list(zip(d2.columns, d2.iloc[0,:]))
    c1 = pd.MultiIndex.from_tuples(cols, names=[None, 0])
    
    print (pd.DataFrame(data=d2[1:].values, columns=c1, index=d2.index[1:]))
    
       Item  other
    0 items others
    1     y     bb
    2     z     cc
    3     x     dd
    

    或者如果列名不重要:

    cols = list(zip(d2.columns, d2.iloc[0,:]))
    d2.columns = pd.MultiIndex.from_tuples(cols)
    
    print (d2[1:])
       Item  other
      items others
    1     y     bb
    2     z     cc
    3     x     dd
    

    时间安排

    len(df)=400k:

    In [63]: %timeit jez(d22)
    100 loops, best of 3: 6.22 ms per loop
    
    In [64]: %timeit piR(d2)
    10 loops, best of 3: 84.9 ms per loop
    

    len(df)=40:

    In [70]: %timeit jez(d22)
    The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000 loops, best of 3: 941 µs per loop
    
    In [71]: %timeit piR(d2)
    The slowest run took 4.44 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000 loops, best of 3: 1.36 ms per loop
    

    代码

    import pandas as pd
    
    d2=pd.DataFrame({'Item':['items','y','z','x'],
                    'other':['others','bb','cc','dd']})
    
    print (d2) 
    d2 = pd.concat([d2]*100000).reset_index(drop=True) 
    #d2 = pd.concat([d2]*10).reset_index(drop=True)   
    d22 = d2.copy()
    
    def piR(d2):
        return (d2.T.set_index(0, append=1).T) 
    
    
    def jez(d2):
        cols = list(zip(d2.columns, d2.iloc[0,:]))
        c1 = pd.MultiIndex.from_tuples(cols, names=[None, 0])
    
        return pd.DataFrame(data=d2[1:].values, columns=c1, index=d2.index[1:])  
    
    print (piR(d2))
    print (jez(d22))
    
    print ((piR(d2) == jez(d22)).all())
    Item   items     True
    other  others    True
    dtype: bool
    

    【讨论】:

      【解决方案2】:

      转置DataFrame,set_index 与参数append = True 的第一列,然后转置回来。

      d2.T.set_index(0, append=1).T
      

      【讨论】:

      • jezrael,您回答中的第二种情况是我所需要的,piRSquared 的回答也是如此。我很幸运能定期从您的两个见解中受益。
      猜你喜欢
      • 1970-01-01
      • 2023-03-02
      • 1970-01-01
      • 2018-02-02
      • 2013-03-22
      • 1970-01-01
      • 2018-01-26
      • 1970-01-01
      • 2013-08-18
      相关资源
      最近更新 更多