5 支持向量回归

5.1 问题定义

给定训练样本D={(x1,y1),(x2,y2),...,(xm,ym)},yiRD= \{(\boldsymbol x_1,y_1), (\boldsymbol x_2,y_2), ..., (\boldsymbol x_m,y_m)\},y_i \in \Bbb R,希望学得形如 f(x)=wTx+bf(\boldsymbol x) = \boldsymbol w^T \boldsymbol x + b 的模型,使f(x)yf(\boldsymbol x)与y尽可能接近,其中w\boldsymbol wb\boldsymbol b是待确定参数。

支持向量回归(SVR,Support Vector Regression)假设能够容忍f(x)f(\boldsymbol x)yy之间最多ϵ\epsilon的偏差,即仅当f(x)f(\boldsymbol x)yy之间的差别绝对值大于ϵ\epsilon时才计算损失。这相当于以
f(x)=wTx+bf(\boldsymbol x) = \boldsymbol w^T \boldsymbol x + b 为中心,构建了一个宽带2ϵ2\epsilon的间隔带,若训练样本落入此间隔带,则被认为预测正确。
SVM(五):支持向量回归
SVR优化目标如下:
min    12w2+Ci=1mlϵ(yi(wTxi+b)1)\min\;\; \frac{1}{2}||\boldsymbol w||^2 +C\sum\limits_{i=1}^{m} {l}_{\epsilon} (y_i(\boldsymbol w^T \boldsymbol x_i + b)-1)

其中C>0C>0是正则化常数,lϵ{l}_{\epsilon}ϵ\epsilon-不敏感损失(ϵ\epsilon-insensitive loss)损失函数:
lϵ(z)={0ifzϵzϵotherwisel_{\epsilon}(z) = \begin{cases} 0 & {if } |z| \leq \epsilon \\ |z|-\epsilon & {otherwise } \end{cases}
SVM(五):支持向量回归

5.2 对偶问题

引入松弛变量ξi>0,ξi>0\xi_i^{\lor}>0, \xi_i^{\land}>0(两边松弛变量可能不同),优化目标变为:
min  12w22+Ci=1m(ξi+ξi)s.t.    yi(wTxi+b)ϵ+ξi,  wTxi+byi  ϵ+ξi,ξi0,    ξi0  (i=1,2,...,m)\begin{aligned} \min & \; \frac{1}{2}||w||_2^2 + C\sum\limits_{i=1}^{m}(\xi_i^{\lor}+ \xi_i^{\land}) \\ s.t. \;\; &y_i - (\boldsymbol w^T \boldsymbol x_i + b) \leq \epsilon + \xi_i^{\land}, \\& \; \boldsymbol w^T \boldsymbol x_i + b - y_i \; \leq \epsilon + \xi_i^{\lor}, \\&\xi_i^{\lor} \geq 0, \;\; \xi_i^{\land} \geq 0 \;(i = 1,2,..., m) \end{aligned}

引入拉格朗日乘子μi0,μi0,αi0,αi0\mu_i^{\lor} \geq 0, \mu_i^{\land} \geq 0, \alpha_i^{\lor} \geq 0, \alpha_i^{\land} \geq 0

L(w,b,α,α,ξi,ξi,μ,μ)=12w22+Ci=1m(ξi+ξi)i=1mμiξii=1mμiξi+i=1mαi(f(xi)yiϵξi)+i=1mαi(yif(xi)ϵξi) L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi_i^{\lor}, \boldsymbol \xi_i^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land}) = \frac{1}{2}||w||_2^2 + C\sum\limits_{i=1}^{m}(\xi_i^{\lor}+ \xi_i^{\land}) - \sum\limits_{i=1}^{m}\mu_i^{\lor}\xi_i^{\lor} - \sum\limits_{i=1}^{m}\mu_i^{\land}\xi_i^{\land}+ \sum\limits_{i=1}^{m}\alpha_i^{\lor}(f(\boldsymbol x_i)-y_i -\epsilon - \xi_i^{\lor}) + \sum\limits_{i=1}^{m}\alpha_i^{\land}(y_i - f(\boldsymbol x_i) -\epsilon - \xi_i^{\land})其中,
f(xi)=wTxi+b f(\boldsymbol x_i) = \boldsymbol w^T \boldsymbol x_i + b

优化目标minw,b,ξi,ξi  maxμi,μi,αi,αi  L(w,b,α,α,ξ,ξ,μ,μ)\min_{\boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land}}\; \max_{\mu_i^{\lor}, \mu_i^{\land}, \alpha_i^{\lor}, \alpha_i^{\land}}\; L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land})

满足KTT条件,对偶问题为:maxμi,μi,αi,αi  minw,b,ξi,ξi  L(w,b,α,α,ξ,ξ,μ,μ)\max_{\mu_i^{\lor}, \mu_i^{\land}, \alpha_i^{\lor}, \alpha_i^{\land}}\; \min_{\boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land}}\; L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land})
首先通过对w,b,ξi,ξi\boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land}求偏导,计算极小值:
Lw=0  w=i=1m(αiαi)xi\frac{\partial L}{\partial \boldsymbol w} = 0 \;\Rightarrow \boldsymbol w = \sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor})\boldsymbol x_iLb=0  i=1m(αiαi)=0\frac{\partial L}{\partial b} = 0 \;\Rightarrow \sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor}) = 0Lξi=0  C=α+μ\frac{\partial L}{\partial \xi_i^{\lor}} = 0 \;\Rightarrow C = \alpha^{\lor} + \mu^{\lor}Lξi=0  C=α+μ\frac{\partial L}{\partial \xi_i^{\land}} = 0 \;\Rightarrow C= \alpha^{\land}+ \mu^{\land}

代回至
L(w,b,α,α,ξ,ξ,μ,μ)L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land})

minw,b,ξi,ξi  L(w,b,α,α,ξ,ξ,μ,μ)=i=1myi(αiαi)ϵ(αi+αi)12i=1mj=1m(αiαi)(αjαj)xiTxjs.t.  i=1m(αiαi)=00αi,αiC  (i=1,2,...m)\begin{aligned} \min_{\boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land}}\; L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land}) & = \sum\limits_{i=1}^{m}y_i(\alpha_i^{\land}- \alpha_i^{\lor}) - \epsilon(\alpha_i^{\land} + \alpha_i^{\lor})- \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor})(\alpha_j^{\land} - \alpha_j^{\lor}) \boldsymbol x_i^T \boldsymbol x_j \\ s.t. \; &\sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor}) = 0 \\ & 0 \leq \alpha_i^{\lor},\alpha_i^{\land} \leq C \; (i =1,2,...m) \end {aligned}

原问题最终转换为如下形式的对偶问题:
maxα,α    i=1myi(αiαi)ϵ(αi+αi)12i=1mj=1m(αiαi)(αjαj)xiTxjs.t.  i=1m(αiαi)=00αi,αiC  (i=1,2,...m)\begin{aligned} \max_{\boldsymbol \alpha^{\land},\boldsymbol \alpha^{\lor}}\;\; & \sum\limits_{i=1}^{m}y_i(\alpha_i^{\land}- \alpha_i^{\lor}) - \epsilon(\alpha_i^{\land} + \alpha_i^{\lor})- \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor})(\alpha_j^{\land} - \alpha_j^{\lor}) \boldsymbol x_i^T \boldsymbol x_j \\ s.t. \; &\sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor}) = 0 \\ & 0 \leq \alpha_i^{\lor},\alpha_i^{\land} \leq C \; (i =1,2,...m) \end {aligned}

此时,优化函数仅有α,α\boldsymbol \alpha^{\land},\boldsymbol \alpha^{\lor}做为参数,可采用SMO(Sequential Minimal Optimization)求解,进而得出w,b\boldsymbol w,b

相关文章: