Supervised Learning:
we gave the algorithm a data set , in which the "right answers"were given.and the task of the algorithm was to jut produce more of these right answers.This is also called a regression problem.
example: 连续值预测、离散值预测。
Unsupervised Learning:
we're just told,here is a data set, can you find ome structure in the data? This is called a clustering algorithm.
example: 新闻分类、社交分析、市场分割。
Supervised Learning
linear regression
m = Number of training examples.
x's = "input" variable / features
y's = "output" variable / "target" variable
(x , y) = one training example
= i.th training example
hypothesi example: =
linear regression 最常用均方误差(Mean squared error)
costfunction:
goal : minimize
3-D surface plot:
contour figures:(right)
each of these ovals shows is a set of points , that takes in the sae value for
point --> --> line
Gradient descent
a more generral algorithm
repeat until convergence {
for ( j=0 and j=1)
}
α : learning rate
!! update at the same time !!
Why be closer to the minimun:
too small or large :
if α is too small , gradient descent can be slow.
if α is too large , gradient descent can overshoot the minimun.it may fail to converge ,or even diverge.
As we approach a local minimun , gradient descent will automatically take smaller steps.So,no need to decrease α over time.
Gradient descent algorithm:
Installing octave: