具有行业级约束的 SciPy 投资组合优化答案

【问题标题】：SciPy portfolio optimization with industry-level constraints具有行业级约束的 SciPy 投资组合优化
【发布时间】：2017-11-14 21:36:36
【问题描述】：

在这里尝试优化投资组合权重分配，通过限制风险最大化我的回报函数。通过所有权重之和等于 1 的简单约束，我可以毫无问题地找到产生我的回报函数的优化权重，并做出另一个约束，即我的总风险低于目标风险。

我的问题是，如何为每个组添加行业权重界限？

我的代码如下：

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import scipy.optimize as sco

dates = pd.date_range('1/1/2000', periods=8)
industry = ['industry', 'industry', 'utility', 'utility', 'consumer']
symbols = ['A', 'B', 'C', 'D', 'E']  
zipped = list(zip(industry, symbols))
index = pd.MultiIndex.from_tuples(zipped)

noa = len(symbols)

data = np.array([[10, 9, 10, 11, 12, 13, 14, 13],
                 [11, 11, 10, 11, 11, 12, 11, 10],
                 [10, 11, 10, 11, 12, 13, 14, 13],
                 [11, 11, 10, 11, 11, 12, 11, 11],
                 [10, 11, 10, 11, 12, 13, 14, 13]])

market_to_market_price = pd.DataFrame(data.T, index=dates, columns=index)

rets = market_to_market_price / market_to_market_price.shift(1) - 1.0
rets = rets.dropna(axis=0, how='all')

expo_factor = np.ones((5,5))
factor_covariance = market_to_market_price.cov()
delta = np.diagflat([0.088024, 0.082614, 0.084237, 0.074648,
                                 0.084237])
cov_matrix = np.dot(np.dot(expo_factor, factor_covariance),
                            expo_factor.T) + delta

def calculate_total_risk(weights, cov_matrix):
    port_var = np.dot(np.dot(weights.T, cov_matrix), weights)
    return port_var

def max_func_return(weights):
    return -np.sum(rets.mean() * weights)

# optimized return with given risk
tolerance_risk = 27
noa = market_to_market_price.shape[1]
cons = ({'type': 'eq', 'fun': lambda x:  np.sum(x) - 1},
         {'type': 'eq', 'fun': lambda x:  calculate_total_risk(x, cov_matrix) - tolerance_risk})
bnds = tuple((0, 1) for x in range(noa))
init_guess = noa * [1. / noa,]
opts_mean = sco.minimize(max_func_return, init_guess, method='SLSQP',
                       bounds=bnds, constraints=cons)


In [88]: rets
Out[88]: 
            industry             utility            consumer
                   A         B         C         D         E
2000-01-02 -0.100000  0.000000  0.100000  0.000000  0.100000
2000-01-03  0.111111 -0.090909 -0.090909 -0.090909 -0.090909
2000-01-04  0.100000  0.100000  0.100000  0.100000  0.100000
2000-01-05  0.090909  0.000000  0.090909  0.000000  0.090909
2000-01-06  0.083333  0.090909  0.083333  0.090909  0.083333
2000-01-07  0.076923 -0.083333  0.076923 -0.083333  0.076923
2000-01-08 -0.071429 -0.090909 -0.071429  0.000000 -0.071429

In[89]: opts_mean['x'].round(3)
Out[89]: array([ 0.233,  0.117,  0.243,  0.165,  0.243])

如何添加这样的组界限，使 5 个资产的总和落入界限以下？

model = pd.DataFrame(np.array([.08,.12,.05]), index= set(industry), columns = ['strategic'])
model['tactical'] = [(.05,.41), (.2,.66), (0,.16)]
In [85]: model
Out[85]: 
          strategic      tactical
industry       0.08  (0.05, 0.41)
consumer       0.12   (0.2, 0.66)
utility        0.05     (0, 0.16)

我已经阅读了类似的帖子SciPy optimization with grouped bounds，但仍然无法获得任何线索，任何人都可以帮忙吗？谢谢。

【问题讨论】：

标签： python pandas optimization scipy portfolio

【解决方案1】：

首先，考虑使用cvxopt，这是一个专门为凸优化设计的模块。我不太熟悉，但有效前沿的一个例子是here。

现在解决您的问题，这里有一个解决方法，专门适用于您发布的问题并使用minimize。（它可以被推广以在输入类型和用户友好性方面创造更大的灵活性，并且基于类的实现在这里也很有用。）

关于您的问题“如何添加组边界？”，简短的回答是您实际上需要通过 constraints 而不是 bounds 参数来执行此操作，因为

可选地，x 中每个元素的下限和上限 也可以使用 bounds 参数指定。 [强调]

此规范与您尝试执行的操作不匹配。相反，下面的示例所做的是为每个组的上限和下限分别添加一个不等式约束。函数mapto_constraints 返回添加到当前约束的字典列表。

首先，这里有一些示例数据：

import pandas as pd
import numpy as np
import numpy.random as npr
npr.seed(123)
from scipy.optimize import minimize

# Create a DataFrame of hypothetical returns for 5 stocks across 3 industries,
# at daily frequency over a year.  Note that these will be in decimal
# rather than numeral form. (i.e. 0.01 denotes a 1% return)

dates = pd.bdate_range(start='1/1/2000', end='12/31/2000')
industry = ['industry'] * 2 + ['utility'] * 2 + ['consumer']
symbols = list('ABCDE')
zipped = list(zip(industry, symbols))
cols = pd.MultiIndex.from_tuples(zipped)
returns = pd.DataFrame(npr.randn(len(dates), len(cols)), index=dates, columns=cols)
returns /= 100 + 3e-3 #drift term

returns.head()
Out[191]: 
           industry           utility          consumer
                  A        B        C        D        E
2000-01-03 -0.01484  0.00986 -0.00476  0.00235 -0.00630
2000-01-04  0.00518  0.00958 -0.01210 -0.00814 -0.01664
2000-01-05  0.00233 -0.01665 -0.00366  0.00520  0.02058
2000-01-06  0.00368  0.01253  0.00259  0.00309 -0.00211
2000-01-07 -0.00383  0.01174  0.00375  0.00336 -0.00608

您可以看到年化数字“有意义”：

(1 + returns.mean()) ** 252 - 1
Out[199]: 
industry  A   -0.05531
          B    0.32455
utility   C    0.10979
          D    0.14339
consumer  E   -0.12644

现在介绍一些将在优化中使用的函数。这些是根据 Yves Hilpisch 的 Python for Finance 第 11 章中的示例紧密建模的。

def logrels(rets):
    """Log of return relatives, ln(1+r), for a given DataFrame rets."""
    return np.log(rets + 1)

def statistics(weights, rets):
    """Compute expected portfolio statistics from individual asset returns.

    Parameters
    ==========
    rets : DataFrame
        Individual asset returns.  Use numeral rather than decimal form
    weights : array-like
        Individual asset weights, nx1 vector.

    Returns
    =======
    list of (pret, pvol, pstd); these are *per-period* figures (not annualized)
        pret : expected portfolio return
        pvol : expected portfolio variance
        pstd : expected portfolio standard deviation

    Note
    ====
    Note that Modern Portfolio Theory (MPT), being a single-period model,
    works with (optimizes using) continuously compounded returns and
    volatility, using log return relatives.  The difference between these and
    more commonly used geometric means will be negligible for small returns.
    """

    if isinstance(weights, (tuple, list)):
        weights = np.array(weights)
    pret = np.sum(logrels(rets).mean() * weights)
    pvol = np.dot(weights.T, np.dot(logrels(rets).cov(), weights))
    pstd = np.sqrt(pvol)
    return [pret, pvol, pstd]

# The below are a few convenience functions around statistics() above, needed
# because scipy minimize must optimize a function that returns a scalar

def port_ret(weights, rets):
    return -1 * statistics(weights=weights, rets=rets)[0]

def port_variance(weights, rets):
    return statistics(weights=weights, rets=rets)[1]

这是等权重投资组合的预期年化标准差。我只是在这里给出这个作为优化中的锚点（risk_tol 参数）。

statistics([0.2] * 5, returns)[2] * np.sqrt(252) # ew anlzd stdev
Out[192]: 0.06642120658640735

下一个函数采用一个看起来像 model DataFrame 的 DataFrame，并为每个组构建约束。请注意，这是非常不灵活的，因为您需要遵循返回的特定格式和您现在使用的 model DataFrames。

def mapto_constraints(rets, model):
    tactical = model['tactical'].to_dict() # values are tuple bounds
    industries = rets.columns.get_level_values(0)
    group_cons = list()
    for key in tactical:
        if isinstance(industries.get_loc('consumer'), int):
            pos = [industries.get_loc(key)]
        else:
            pos = np.where(industries.get_loc(key))[0].tolist()
        lb = tactical[key][0]
        ub = tactical[key][1] # upper and lower bounds
        lbdict = {'type': 'ineq', 
                  'fun': lambda x: np.sum(x[pos[0]:(pos[-1] + 1)]) - lb}
        ubdict = {'type': 'ineq', 
                  'fun': lambda x: ub - np.sum(x[pos[0]:(pos[-1] + 1)])}
        group_cons.append(lbdict); group_cons.append(ubdict)
    return group_cons

关于如何在上面构建约束的注释：

等式约束意味着约束函数的结果是零，而不等式意味着它是非负的。

最后，优化本身：

def opt(rets, risk_tol, model, round=3):    
    noa = len(rets.columns)
    guess = noa * [1. / noa,] # equal-weight; needed for initial guess
    bnds = tuple((0, 1) for x in range(noa))
    cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1.},
            {'type': 'ineq', 'fun': lambda x: risk_tol - port_variance(x, rets=rets)}
           ] + mapto_constraints(rets=rets, model=model)
    opt = minimize(port_ret, guess, args=(returns,), method='SLSQP', bounds=bnds, 
                   constraints=cons, tol=1e-10)
    return opt.x.round(round)

model = pd.DataFrame(np.array([.08,.12,.05]), 
                     index= set(industry), columns = ['strategic'])
model['tactical'] = [(.05,.41), (.2,.66), (0,.16)]

# Set variance threshold equal to the equal-weighted variance
# Note that I set variance as an inequality rather than equality (i.e.
# resulting variance should be less than threshold).

opt(returns, risk_tol=port_variance([0.2] * 5, returns), model=model)
Out[195]: array([ 0.188,  0.225,  0.229,  0.197,  0.16 ])

【讨论】：

感谢您的回复。 mapto_constraints 函数的一点变化： lbdict = {'type': 'ineq', 'fun': lambda x: np.sum(x[pos[0]:(pos[-1] + 1)]) - lb} ubdict = {'type': 'ineq', 'fun': lambda x: ub - np.sum(x[pos[0]:(pos[-1] + 1)])}
注意for key in tactical:中的lambda函数。 Lambda 函数不能声明和使用局部变量，这意味着 pos 变量只会获取最后分配的值。