机器学习基石 - Learning to Answer Yes/No

机器学习基石上 (Machine Learning Foundations)—Mathematical Foundations
Hsuan-Tien Lin, 林轩田，副教授 (Associate Professor)，资讯工程学系 (Computer Science and Information Engineering)

Learning to Answer Yes/No

$A$ takes $D$ and $H$ to get $g$

Perceptron(感知器) Hypothesis Set

综合各个参数来得出一个分数
$x = (x_{1}, x_{2}, \dots, x_{d})$ —— features of customer
各个参数（维度）乘上相应的权重再相加
- approve credit if $\sum_{i = 1}^{d} w_{i} x_{i} > t h r e s h o l d$
- deny credit if $\sum_{i = 1}^{d} w_{i} x_{i} < t h r e s h o l d$
$Y : {+ 1 (g o o d), - 1 (b a d)}$ ，(0 - ignored)
linear formula $h \in H$ is $h (x) = s i g n ((\sum_{i = 1}^{d} w_{i} x_{i}) - t h r e s h o l d)$
- 简化一些：令 $w_{0} = - t h r e s h o l d, x_{0} = 1$
- $h (x) = s i g n (\sum_{i = 0}^{d} w_{i} x_{i}) = s i g n (w^{T} x)$ (向量内积)
- each $w$ represents a hypothesis $h$ ，不同的参数对应不同的函数
Perceptron in $R^{2}$
- 二维感受器
- 不同的分类（参数）有不同的效果
- 令 $h = 0$ ，得到的几何图形是一条线，线性分类器

Perceptron Learning Algorithm (PLA)

$H$ includes all possible perceptrons (infinite), how to select $g$ ?
want、necessary、difficult、idea
- what we want: $g \approx f$ (hard when $f$ is unknown)
- 可行的是在已知的数据里，理想情况下使得 $g (x_{n}) = f (x_{n}) = y_{n}$
- 先有一条线 $g_{0}$ ，再慢慢改进修正参数 $w_{0}$
步骤
- 向量内积的正负可以通过夹角判断
- 修正向量，改变夹角
- A fault confessed is half redressed. (知错能改善莫大焉)
Cyclic PLA
- a full cycle of not encountering mistakes
- ‘correct’ mistakes on $D$ until no mistakes
- find the next mistake: follow naive cycle or precomputed random cycle
存在的问题
- 循环一定会中止吗
- 得到的 $g$ 和所设想的 $f$ 究竟接近吗
- 数据之外的表现如何
思考题
注意第二个选项

Guarantee of PLA

if PLA halts (no more mistakes)
- (necessary condition) $D$ allows some $w$ to make no mistake
- call such $D$ linear separable (线性可分)
linear separable $D$ ⇔ exists perfect $w_{f}$ such that $y_{n} = s i g n (w_{f}^{T} x_{n})$
- 证明1
- 向量内积的操作是通过矩阵乘法实现的
- $w_{t}$ gets more aligned with $w_{f}$ (因为内积变大)
已知式
- 证明2
- $w_{t}$ does not grow too fast (长度增量有上界)
- $w_{t}$ 和 $w_{f}$ 的夹角会越来越小，存在下界 0 度
思考题

Non-Separable Data

linear separable: inner product of $w_{f}$ and $w_{t}$ grows fast (二者越来越接近)
correct by mistake: length of $w_{t}$ grows slowly (缓慢增长)
PLA ‘lines’ are more and more aligned with $w_{f}$ ⇒ halts
Pros: simple to implement, fast, works in any dimension
Cons
- ‘assumes’ linear separable $D$ to halt (只是假设线性可分)
- not fully sure how long halting takes (何时停止不知道)

Learning with Noisy Data

找一条犯错误最少的线
- 公式
- 括号代表boolean运算
- argmin f(x) 是指使得函数 f(x) 取得其最小值的所有自变量 x 的集合
- NP-hard to solve
Pocket Algorithm
- modify PLA algorithm (black lines) by keeping best weights in pocket (总是取当前情况下最好的)
- 算法
- a simple modification of PLA to find(somewhat) ‘best’ weights
在线性可分的数据集上使用 Pocket 也能找到最优解，但会比 PLA 慢