【问题标题】:slicing specific records from a pandas df core panel从 pandas df 核心面板切片特定记录
【发布时间】:2017-04-03 13:26:58
【问题描述】:

我有一个 pandas 数据框核心面板 (data_r3000),其中包含多个工业部门的股票数据...

{'capital_goods': <class 'pandas.core.panel.Panel'>
 Dimensions: 6 (items) x 13820 (major_axis) x 423 (minor_axis)
 Items axis: OPEN to ADJ_CLOSE
 Major_axis axis: 1962-01-02 00:00:00 to 2016-11-18 00:00:00
 Minor_axis axis: A to ZEUS, 'consumer': <class 'pandas.core.panel.Panel'>
 Dimensions: 6 (items) x 11832 (major_axis) x 94 (minor_axis)
 Items axis: OPEN to ADJ_CLOSE
 Major_axis axis: 1970-01-02 00:00:00 to 2016-11-18 00:00:00
 Minor_axis axis: ABG to WSO, 'consumer_non_durables': <class 'pandas.core.panel.Panel'>
 Dimensions: 6 (items) x 13819 (major_axis) x 138 (minor_axis)

等等。我隔离其中一个部门,我想对 df 中的一些值进行一些修改。

x = data_r3000['capital_goods'].to_frame().unstack(level=1)

这会产生以下df:

我在 pandas 中使用多索引的经验很少,而且我在隔离“AA”的“CLOSE”和“ADJ_CLOSE”记录时遇到了问题。如何隔离这些记录,以便创建一个仅包含 OPEN 和 ADJ_CLOSE 的计时器系列的 AA_df?

我已经尝试过x.xs(['CLOSE','ADJ_CLOSE'], axis=1),,它正确地隔离了我正在寻找的两个功能,但我仍然不知道如何仅隔离“AA”。 谢谢

【问题讨论】:

    标签: python pandas indexing multiple-columns multi-index


    【解决方案1】:

    我觉得你可以用slicers:

    idx = pd.IndexSlice
    print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])
    

    或者:

    print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])
    

    示例:

    cols = pd.MultiIndex.from_product((['ADJ','ADJ_CLOSE', 'CLOSE'],
                                       ['A','AA','AEPI']))
    df = pd.DataFrame(np.arange(27).reshape(3,9),columns=cols)
    
    print (df)
      ADJ          ADJ_CLOSE          CLOSE         
        A  AA AEPI         A  AA AEPI     A  AA AEPI
    0   0   1    2         3   4    5     6   7    8
    1   9  10   11        12  13   14    15  16   17
    2  18  19   20        21  22   23    24  25   26
    
    idx = pd.IndexSlice
    print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])
      ADJ_CLOSE CLOSE
             AA    AA
    0         4     7
    1        13    16
    2        22    25
    
    print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])
      ADJ_CLOSE CLOSE
             AA    AA
    0         4     7
    1        13    16
    2        22    25
    

    Panel 的解决方案:

    np.random.seed(1234)
    rng = pd.date_range('1/1/2013',periods=10,freq='D')
    
    data = np.random.randn(10, 4)
    
    cols = ['A','AA','AAON','ABAX']
    
    df1, df2, df3 = pd.DataFrame(data, rng, cols), 
                    pd.DataFrame(data, rng, cols), 
                    pd.DataFrame(data, rng, cols)
    
    pf = pd.Panel({'OPEN':df1,'ADJ':df2,'ADJ_CLOSE':df3});pf
    print (pf)
    <class 'pandas.core.panel.Panel'>
    Dimensions: 3 (items) x 10 (major_axis) x 4 (minor_axis)
    Items axis: ADJ to OPEN
    Major_axis axis: 2013-01-01 00:00:00 to 2013-01-10 00:00:00
    Minor_axis axis: A to ABAX
    
    print (pf.loc[['OPEN', 'ADJ_CLOSE'], :,'AA'])
                    OPEN  ADJ_CLOSE
    2013-01-01 -1.190976  -1.190976
    2013-01-02  0.887163   0.887163
    2013-01-03 -2.242685  -2.242685
    2013-01-04 -2.021255  -2.021255
    2013-01-05  0.289092   0.289092
    2013-01-06 -0.655969  -0.655969
    2013-01-07 -0.469305  -0.469305
    2013-01-08  1.058969   1.058969
    2013-01-09  1.045938   1.045938
    2013-01-10 -0.322795  -0.322795
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-03-10
      • 2019-10-12
      • 2014-05-28
      • 1970-01-01
      • 1970-01-01
      • 2019-01-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多