使用索引对时间序列数据进行子集化时出现键错误答案

【问题标题】：Key Error while subsetting Timeseries data using index使用索引对时间序列数据进行子集化时出现键错误
【发布时间】：2020-08-06 08:54:56
【问题描述】：

我有以下Timeseries 数据。

price_per_year.head()
            price
      date  
2013-01-02  20.08
2013-01-03  19.78
2013-01-04  19.86
2013-01-07  19.40
2013-01-08  19.66

price_per_year.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 782 entries, 2013-01-02 to 2015-12-31
Data columns (total 1 columns):
price    756 non-null float64
dtypes: float64(1)
memory usage: 12.2 KB

我正在尝试使用以下代码提取 3 年的数据。为什么我得到KeyError: '2014'，当如下所示的数据清楚地包含年份'2014'时。感谢任何输入。

price_per_year['2014'].head()
            price
      date  
2014-01-01  NaN
2014-01-02  39.59
2014-01-03  40.12
2014-01-06  39.93
2014-01-07  40.92

prices = pd.DataFrame()
for year in ['2013', '2014', '2015']:
    price_per_year = price_per_year.loc[year, ['price']].reset_index(drop=True)
    price_per_year.rename(columns={'price': year}, inplace=True)
    prices = pd.concat([prices, price_per_year], axis=1)

KeyError: '2014'

代码行price_per_year.loc['2014', ['price']]，在for loop之外独立使用时，工作正常，而price_per_year['price'][year]在for loop中使用时不起作用。

for year in ['2013', '2014', '2015']:
    price_per_year = price_per_year['price'][year].reset_index(drop=True)

KeyError: 'price'

price_per_year.loc[price_per_year.index.year == 2014, ['price']] 代码行在 for loop 外部独立使用时，price_per_year.loc[price_per_year.index.year == year, ['price']] 在for loop 内部使用时都会出错。

for year in ['2013', '2014', '2015']:
    price_per_year.loc[price_per_year.index.year == '2014', ['price']].reset_index(drop=True)

TypeError: Cannot convert input [False] of type <class 'bool'> to Timestamp

【问题讨论】：

标签： python-3.x pandas time-series subset

【解决方案1】：

这是你第一个代码使用partial string indexing的问题，这里使用DataFrame.loc

prices = pd.DataFrame()
for year in ['2013', '2014', '2015']:
    s = price_per_year['price'][year].reset_index(drop=True).rename(year)
    prices = pd.concat([prices, s], axis=1)
print (prices)
    2013   2014   2015
0  20.08  19.86  19.66
1  19.78  19.40  19.66

另一个更好的 reshape 解决方案：

print (df)
            price
date             
2013-01-02  20.08
2013-01-03  19.78
2014-01-02  19.86
2014-01-03  19.40
2015-01-02  19.66
2015-01-03  19.66

y = df.index.year
df = df.set_index([df.groupby(y).cumcount(), y])['price'].unstack()
print (df)
date   2013   2014   2015
0     20.08  19.86  19.66
1     19.78  19.40  19.66

【讨论】：

感谢您的意见。虽然您建议的重塑解决方案确实有效，但建议给我的代码的选项确实遇到错误。我在原始问题中更新了相同的内容。请看一看。非常感谢。
@Srinivas - 问题是覆盖变量price_per_year，因此将其更改为s，也用于rename，已编辑答案。
非常感谢。知道了。有用。它甚至适用于 df.loc[year, ['price']].reset_index(drop=True)。