Gold

  • To build a system that can take a vector xRn\vec{x}\in\mathbb{R}^n as input and predict the value of a scalar yRy\in\mathbb{R} as its output.

Definition

  • Output:y^=wTx\hat{y}=\vec{w}^T\vec{x}, where wRn\vec{w}\in\mathbb{R}^n is a vector of parameters.
  • Weight: w\vec{w} is a set that determine how each feature affects the prediction.
  • the definition of our task TT: to predict yy from x\vec{x} by outputting y^=wTx\hat{y}=\vec{w}^T\vec{x}.
  • the definition of our performance measure PP: mm example inputs serve as test set, the design matrix of inputs as X(test)\mathbf{X}^{(test)} and the vector of regression targets as ytest\mathbf{y}^{{test}}.
  • measuring the performance:
    • the mean squared error: MSEtest=1mi(y^(test)y(test))i2\text{MSE}_{\text{test}}=\frac{1}{m}\sum\limits_i(\hat{\vec{y}}^{\text{(test)}}-\vec{y}^{(\text{test})})_i^2
    • namely, MSEtest=1m(y^(test)y(test))22\text{MSE}_{\text{test}}=\frac{1}{m}\|(\hat{\vec{y}}^{\text{(test)}}-\vec{y}^{(\text{test})})\|_2^2

Algorithm

  • Minimize MSEtrain\text{MSE}_{\text{train}}:
    wMSEtrain=0w1my^(train)y(train)22=0wX(train)wy(train)22=0w(X(train)wy(train))T(X(train)wy(train))=0w(wTX(train)TX(train)w2wTX(train)Ty(train)+y(train)Ty(train))=02X(train)TX(train)w2X(train)Ty(train)=0w=(X(train)TX(train))1X(train)Ty(train) \begin{matrix} \nabla_{\vec{w}}\text{MSE}_{\text{train}}=0\\ \Longrightarrow\nabla_{\vec{w}}\frac{1}{m}\|\hat{\vec{y}}^{(\text{train})}-\vec{y}^{(train)}\|_2^2=0\\ \Longrightarrow\nabla_{\vec{w}}\|\mathbf{X}^{(\text{train})}\vec{w}-\vec{y}^{(\text{train})}\|_2^2=0\\ \Longrightarrow\nabla_{\vec{w}}(\mathbf{X}^{(\text{train})}\vec{w}-\vec{y}^{(\text{train})})^T(\mathbf{X}^{(\text{train})}\vec{w}-\vec{y}^{(\text{train})})=0\\ \Longrightarrow\nabla_{\vec{w}}(\vec{w}^T\mathbf{X}^{(\text{train})T}\mathbf{X}^{(\text{train})}\vec{w}-2\vec{w}^T\mathbf{X}^{(\text{train})T}\vec{y}^{(\text{train})}+\vec{y}^{(\text{train})T}\vec{y}^{(\text{train})})=0\\ \Longrightarrow2\mathbf{X}^{(\text{train})T}\mathbf{X}^{(\text{train})}\vec{w}-2\mathbf{X}^{(\text{train})T}\vec{y}^{(\text{train})}=0\\ \Longrightarrow\vec{w}=(\mathbf{X}^{(\text{train})T}\mathbf{X}^{(\text{train})})^{-1}\mathbf{X}^{(\text{train})T}\vec{y}^{(\text{train})} \end{matrix}
  • normal equations: w=(X(train)TX(train))1X(train)Ty(train)\vec{w}=(\mathbf{X}^{(\text{train})T}\mathbf{X}^{(\text{train})})^{-1}\mathbf{X}^{(\text{train})T}\vec{y}^{(\text{train})}
  • linear regression for more sophisticated model: y^=wTx+b\hat{y}=\vec{w}^T\vec{x}+b, where bb is called the bias parameter of the affine transformation.
    C5eg1 Linear Regression

相关文章: