【发布时间】:2017-08-21 21:11:11
【问题描述】:
我有两组不同的数据框。
一种是面板,其items 以股票为代表。
这里是获取面板的代码(为了重现性)
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import datetime as dt
import re
startDate = '2010-01-01'
endDate = '2016-09-07'
stocks_query = ['AAPL','OPK']
stocks = web.DataReader(stocks_query, data_source='yahoo',
start=startDate, end=endDate)
stocks = stocks.swapaxes('items','minor_axis')`
导致输出:
Dimensions: 2 (items) x 1682 (major_axis) x 6 (minor_axis)
Items axis: AAPL to OPK
Major_axis axis: 2010-01-04 00:00:00 to 2016-09-07 00:00:00
Minor_axis axis: Open to Adj Close
面板的单个数据框如下所示
stocks['OPK']
Open High Low Close Volume Adj Close log_return \
Date
2010-01-04 1.80 1.97 1.76 1.95 234500.0 1.95 NaN
2010-01-05 1.64 1.95 1.64 1.93 135800.0 1.93 -0.010309
2010-01-06 1.90 1.92 1.77 1.79 546600.0 1.79 -0.075304
2010-01-07 1.79 1.94 1.76 1.92 138700.0 1.92 0.070110
2010-01-08 1.92 1.94 1.86 1.89 62500.0 1.89 -0.015748
然后我通过此代码添加了几个自定义列:
for i in stocks:
stocks[i]['log_return'] = np.log(stocks[i]['Close']/(stocks[i]['Close'].shift(1)))
stocks[i]['30_Avg_Vol'] = stocks[i] ['Volume'].rolling(min_periods =15, window=30).mean()
然后为了只拼接出音量高的行,我通过此代码创建了一个数据帧字典(每个键是股票,每个值是拼接的数据帧)
High_volume ={}
for i in stocks.items: #stocks is a panel, the items are the stocks tickers
print (i)
High_volume[i] =stocks[i][stocks[i].Volume > 1.5* stocks[i]['30_Avg_Vol']]
所以我有一个拼接数据帧的字典,我可以通过股票代码访问每个数据帧。
High_volume['OPK']
High_volume['AAPL']
现在对于每个High_volume 数据帧的每一行中的每个日期(索引是一个日期时间对象),我想创建一堆迷你数据帧。
所以对于 High_volume['AAPL'] 中的所有日期,我想为每个日期创建一个 mini_dataframe。对于High_volume['OPK'] 中的所有日期,我想创建一堆迷你数据框。所以在这种情况下,我想创建两个包含迷你数据框的字典。
High_volume['OPK'] looks something like this, for each date I want to create a mini dataframe
Open High Low Close Volume Adj Close \
Date
2010-02-11 1.710000 2.200000 1.710000 1.940000 2212300.0 1.940000
2010-02-12 1.940000 2.100000 1.940000 2.030000 739500.0 2.030000
2010-03-19 2.030000 2.050000 1.950000 2.030000 611800.0 2.030000
2010-04-12 2.060000 2.210000 2.040000 2.160000 647100.0 2.160000
2010-04-13 2.210000 2.450000 2.160000 2.320000 823200.0 2.320000
每个迷你数据帧将包含大约 X 天的信息。开始日期为拼接行,结束日期约X 天后。为了获得X 其他日子的数据,我正在拼接包含所有股票数据的原始面板(stocks)。
但是,由于我要处理许多股票,我将不得不在一次迭代中创建许多字典(在本例中为两个,OPK 和 AAPL)所以我需要动态命名字典。
所以执行此操作的函数看起来像这样
def slicing (stock, sliced_data, num_of_days):
# stocks = list of stock tickers I'm interesting in exploring
#sliced_data = the high_volume dict I created
#num_of_days = this represents the X days (the size of each mini-dataframe)
time_delta = dt.timedelta(days =num_of_days)
for i in stock: # stock name
vars()['mini_dfs' + i] ={} #dynamically creating a dictionary for that stock
print (vars()['mini_dfs' + i]) # to make sure dictionary was created
for date in sliced_data[i].index: #taking each date of High_volume df
start_date = date
end_date = date + time_delta
vars()['mini_dfs' + i][date] =stocks[i].loc[start_date:end_date] #
#filling the empty dictionary with dataframes (dates are keys, values are dataframes)
return vars()['mini_dfs' + i] #returning the dictionary before creating the new dictionary
该函数似乎正在正确执行,因为我正在为两只股票获得一堆 mini_dataframes 的输出。但是,它没有被保存到两个变量中。 这一切都被保存到一个变量中。 请记住,在这种情况下,我正在处理两只股票,所以我想要创建两个字典。
x=slicing(['AAPL','OPK'], High_volume , 1) # This works
然而,
x,y =slicing(['AAPL','OPK'], High_volume , 1)
ValueError: too many values to unpack (expected 2)
在这种情况下,我怎样才能获得输出两个字典的功能(或者每只股票一个字典,我希望它分析)?
谢谢。
【问题讨论】:
标签: pandas dictionary dataframe iteration naming-conventions