【问题标题】:Unable to delete first row in matrix无法删除矩阵中的第一行
【发布时间】:2016-09-18 22:20:20
【问题描述】:

我在尝试从log_returns 矩阵中删除第一行时卡住了。本质上,我想去掉第一行,因为它有 NaN 值。我尝试了isnan(),没有高兴,最后找到了numpy.delete() 听起来最有希望但仍然没有达到目的的方法。

import pandas as pd
from pandas_datareader import data as web
import numpy as np

symbols = ['XOM', 'CVX', 'SLB', 'PXD', 'EOG', 'OXY', 'HAL', 'KMI', 'SE', 'PSX', 'VLO','COP','APC','TSO','WMB','BHI','APA','COG','DVN','MPC','NBL','CXO','NOV','HES','MRO','EQT','XEC','FTI','RRC','OKE','SWN','NFX','HP','MUR','CHK','RIG','DO']

try:
    h9 = pd.HDFStore('port.h9')
    data = h9['norm']
    h9.close()
except:
    data = pd.DataFrame()
    for sym in symbols:
        data[sym] = web.DataReader(sym, data_source='yahoo',
                                start='1/1/2010')['Adj Close']
    data = data.dropna()
    h9 = pd.HDFStore('port.h9')
    h9['norm'] = data
    h9.close()

data.info()
log_returns = np.log(data / data.shift(1))
log_returns.head()
np.delete(log_returns, 0, 0)

上面的最后一行(要删除)引发以下异常,这没有意义,因为row = 0location = 0 肯定不会超出形状为 (1116,37) 的 log_returns 矩阵的范围)。

ValueError: Shape of passed values is (37, 1115), indices imply (37, 1116)

【问题讨论】:

  • 怎么样:log_returns = log_returns.iloc[1:]?
  • np.delete() 的第二个参数可能不是你想的那样。如果您只需要扔掉第一行,@MaxU 的建议就是要走的路。此外,np.nan!=np.nan 将使np.delete 的工作更加困难。
  • MaxU -- iloc 方法很有效!非常感谢。还要感谢 Andras 的回复。

标签: python pandas numpy dataframe


【解决方案1】:

演示:

In [202]: from pandas_datareader import data as web

In [218]: df = web.DataReader('XOM', 'yahoo', start='1/1/2010')['Adj Close']

In [219]: pd.options.display.max_rows = 10

In [220]: df
Out[220]:
Date
2010-01-04    57.203028
2010-01-05    57.426378
2010-01-06    57.922715
2010-01-07    57.740730
2010-01-08    57.509100
                ...
2016-09-12    87.290001
2016-09-13    85.209999
2016-09-14    84.599998
2016-09-15    85.080002
2016-09-16    84.029999
Name: Adj Close, dtype: float64

In [221]: np.log(df.head(10).pct_change() + 1)
Out[221]:
Date
2010-01-04         NaN
2010-01-05    0.003897
2010-01-06    0.008606
2010-01-07   -0.003147
2010-01-08   -0.004020
2010-01-11    0.011157
2010-01-12   -0.004991
2010-01-13   -0.004011
2010-01-14    0.000144
2010-01-15   -0.008214
Name: Adj Close, dtype: float64

解决方案:

In [224]: np.log(df.pct_change() + 1).dropna()
Out[224]:
Date
2010-01-05    0.003897
2010-01-06    0.008606
2010-01-07   -0.003147
2010-01-08   -0.004020
2010-01-11    0.011157
                ...
2016-09-12    0.005169
2016-09-13   -0.024117
2016-09-14   -0.007185
2016-09-15    0.005658
2016-09-16   -0.012418
Name: Adj Close, dtype: float64

或:

In [225]: np.log(df.pct_change() + 1).iloc[1:]
Out[225]:
Date
2010-01-05    0.003897
2010-01-06    0.008606
2010-01-07   -0.003147
2010-01-08   -0.004020
2010-01-11    0.011157
                ...
2016-09-12    0.005169
2016-09-13   -0.024117
2016-09-14   -0.007185
2016-09-15    0.005658
2016-09-16   -0.012418
Name: Adj Close, dtype: float64

或:

In [227]: np.log(df.pct_change() + 1).drop(df.index[0])
Out[227]:
Date
2010-01-05    0.003897
2010-01-06    0.008606
2010-01-07   -0.003147
2010-01-08   -0.004020
2010-01-11    0.011157
                ...
2016-09-12    0.005169
2016-09-13   -0.024117
2016-09-14   -0.007185
2016-09-15    0.005658
2016-09-16   -0.012418
Name: Adj Close, dtype: float64

【讨论】:

    猜你喜欢
    • 2014-09-11
    • 1970-01-01
    • 1970-01-01
    • 2014-08-17
    • 2021-11-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多