【发布时间】:2016-12-07 07:15:11
【问题描述】:
我正在尝试编写一个代码来构建由组合的协整对组成的 dataFrame(股票价格是协整的)。在这种情况下,投资组合中的股票选自标准普尔 500 指数,并且它们具有相同的权重。
此外,对于某些经济问题,投资组合必须包含相同的行业。
例如: 如果一个投资组合中的股票来自 [IT] 和 [Financial] 部门,则第二个投资组合必须从 [IT] 和 [Financial] 部门中选择股票。
投资组合中的股票数量不正确,因此我考虑为每只股票设置大约 10 到 20 只股票。但是,当谈到组合时,这是(500 选择 10),所以我有计算时间的问题。
以下是我的代码:
def adf(x, y, xName, yName, pvalue=0.01, beta_lower=0.5, beta_upper=1):
res=pd.DataFrame()
regress1, regress2 = pd.ols(x=x, y=y), pd.ols(x=y, y=x)
error1, error2 = regress1.resid, regress2.resid
test1, test2 = ts.adfuller(error1, 1), ts.adfuller(error2, 1)
if test1[1] < pvalue and test1[1] < test2[1] and\
regress1.beta["x"] > beta_lower and regress1.beta["x"] < beta_upper:
res[(tuple(xName), tuple(yName))] = pd.Series([regress1.beta["x"], test1[1]])
res = res.T
res.columns=["beta","pvalue"]
return res
elif test2[1] < pvalue and regress2.beta["x"] > beta_lower and\
regress2.beta["x"] < beta_upper:
res[(tuple(yName), tuple(xName))] = pd.Series([regress2.beta["x"], test2[1]])
res = res.T
res.columns=["beta","pvalue"]
return res
else:
pass
def coint(dataFrame, nstocks = 2, pvalue=0.01, beta_lower=0.5, beta_upper=1):
# dataFrame = pandas_dataFrame, in this case, data['Adj Close'], row=time, col = tickers
# pvalue = level of significance of adf test
# nstocks = number of stocks considered for adf test (equal weight)
# if nstocks > 2, coint return cointegration between portfolios
# beta_lower = lower bound for slope of linear regression
# beta_upper = upper bound for slope of linear regression
a=time.time()
tickers = dataFrame.columns
tcomb = itertools.combinations(dataFrame.columns, nstocks)
res = pd.DataFrame()
sec = pd.DataFrame()
for pair in tcomb:
xName, yName = list(pair[:int(nstocks/2)]), list(pair[int(nstocks/2):])
xind, yind = tickers.searchsorted(xName), tickers.searchsorted(yName)
xSector = list(SNP.ix[xind]["Sector"])
ySector = list(SNP.ix[yind]["Sector"])
if set(xSector) == set(ySector):
sector = [[(xSector, ySector)]]
x, y = dataFrame[list(xName)].sum(axis=1), dataFrame[list(yName)].sum(axis=1)
res1 = adf(x,y,xName,yName)
if res1 is None:
continue
elif res.size==0:
res=res1
sec = pd.DataFrame(sector, index = res.index, columns = ["sector"])
print("added : ", pair)
else:
res=res.append(res1)
sec = sec.append(pd.DataFrame(sector, index = [res.index[-1]], columns = ["sector"]))
print("added : ", pair)
res = pd.concat([res,sec],axis=1)
res=res.sort_values(by=["pvalue"],ascending=True)
b=time.time()
print("time taken : ", b-a, "sec")
return res
当 nstocks=2 时,这大约需要 263 秒,但随着 nstocks 的增加,循环需要很多时间(超过一天)
我使用 pandas_datareader.data 从 yahoo Finance 收集了“Adj Close”数据 并且索引是时间,列是不同的代码
任何建议或帮助将不胜感激
【问题讨论】:
标签: python python-3.x pandas dataframe