Lecture 6: Theory of Generalization


Restriction of Break Point


机器学习基石笔记(六):泛化理论


机器学习基石笔记(六):泛化理论


mH(N) maximum possible mH(N) given kpoly(N) \begin{aligned} & m_{\mathcal{H}}(N) \\ \leq & \text { maximum possible } m_{\mathcal{H}}(N) \text { given } k \\ \leq & p o l y(N) \end{aligned}

Fun Time

When minimum break point k = 1, what is the maximum possible mH(N)m_{\mathcal{H}}(N) when N=3N = 3
1.  1 \checkmark             2. 2          3. 3           4. 4


Explanation
因为k=1k=1,所以没有任何一个点可以和它共存,所以mH(N)=1m_H (N) = 1

Bounding Function: Basic Cases


Bounding Function

bounding function B(N,k)B(N,k):
  maximum possible mH(N)m_H (N) when break point = k
B(N,k)poly(N) B(N, k) \leq p o l y(N)

换言之,B(N,k)B(N, k)mH(N)m_H (N)上界


Table of Bounding Function

机器学习基石笔记(六):泛化理论

Fun Time

For the 2D perceptrons, which of the following claim is true?
1 minimum break point k = 2
2 mH(4)m_{\mathcal{H}}(4)= 15
3 mH(N)<B(N,k)m_{\mathcal{H}}(N)<B(N, k) when $N = k = $ minimum break point  \checkmark
4 mH(N)>B(N,k)m_{\mathcal{H}}(N)>B(N, k) when $N = k = $ minimum break point


Explanation
minimum break point k = 3
mH(4)m_{\mathcal{H}}(4)= 14
B(N,k)B(N, k)mH(N)m_H (N)上界
不记得2D感知器的同学,可以回顾Lecture 5: Training versus Testing中的Effective Number of Hypotheses ????

Bounding Function: Inductive Cases


B(4,3)=11=2α+β B(4,3)=11=2 \alpha+\beta
机器学习基石笔记(六):泛化理论


B(N,k)=2α+βα+βB(N1,k)αB(N1,k1)B(N,k)B(N1,k)+B(N1,k1)B(N,k)i=0k1(Ni) \begin{aligned} B(N, k) &=2 \alpha+\beta \\ \alpha+\beta & \leq B(N-1, k) \\ \alpha & \leq B(N-1, k-1) \\ \Rightarrow B(N, k) & \leq B(N-1, k)+B(N-1, k-1) \end{aligned} \\ B(N, k) \leq \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)
机器学习基石笔记(六):泛化理论

\le 实际上是==

B(N,k)=B(N1,k)+B(N1,k1)B(N,k)=i=0k1(Ni)=CN0+CN1+...+CNk1 B(N, k) = B(N-1, k)+B(N-1, k-1) \\ B(N, k) = \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) = C_N^0+C_N^1 +...+C_N^{k-1}

机器学习基石笔记(六):泛化理论

2D perceptrons break point at 4, mH(N)B(N,4)=16N3+56N+1=O(N3)m_{\mathcal{H}}(N) \leq B(N, 4) = \frac{1}{6} N^{3}+\frac{5}{6} N+1 = O(N^3)

Fun Time

For 1D perceptrons (positive and negative rays), we know that mH(N)m_H (N) = 2N. Let k be the minimum break point. Which of the following is not true?
1 k = 3
2 for some integers N>0, mH(N)=i=0k1(Ni)N>0 ,\ m_{\mathcal{H}}(N)=\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)
3 for all integers N>0, mH(N)=i=0k1(Ni)N>0 ,\ m_{\mathcal{H}}(N)=\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)  \checkmark
4 for all integers N>2, mH(N)<i=0k1(Ni)N>2 ,\ m_{\mathcal{H}}(N)<\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)


Explanation
minimum break point k = 3
B(N,k)=i=0k1(Ni)B(N, k) = \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)
B(N,k)B(N, k)mH(N)m_H (N)上界,当N\gek时,mH(N)<B(N,k)m_H (N)<B(N, k); 当N<<k时,mH(N)=B(N,k)m_H (N)=B(N, k).


拓展:回顾下Lecture 5: Training versus Testing中的Effective Number of Hypotheses Funtime
求2维感知器中5个点的有效分类数(k=3,N=5 mH(N)=?16N3+56N+1m_{\mathcal{H}}(N)=? \leq \frac{1}{6} N^{3}+\frac{5}{6} N+1),N>k,=取不到。
正确答案22<(1256+256+1=25\frac{125}{6}+\frac{25}{6}+1=25),验证成功,回顾题目也挺有趣味的。????


A Pictorial Proof

机器学习基石笔记(六):泛化理论

EinE_{in}&#x27;(有限)替换EoutE_{out}(无限),但是这个不等式及12\frac{1}{2}的系数的出处,我没想明白。

机器学习基石笔记(六):泛化理论

将上界定义为以mH(2N)m_{H}(2N)为基准的。

机器学习基石笔记(六):泛化理论

使用无放回的霍夫丁不等式,结果类似,只是ν=E in ,μ=E in +E in 2\nu=E_{\text { in }},\mu=\frac{E_{\text { in }}+E_{\text { in }}^{\prime}}{2}

Vapnik-Chervonenkis (VC) bound

P[hH s.t. E in (h)E out (h)&gt;ϵ]4mH(2N)exp(18ϵ2N) \begin{aligned} &amp; \mathbb{P}\left[\exists h \in \mathcal{H} \text { s.t. } | E_{\text { in }}(h)-E_{\text { out }}(h) |&gt;\epsilon\right] \\ &amp; \leq 4 m_{\mathcal{H}}(2 N) \exp \left(-\frac{1}{8} \epsilon^{2} N\right) \end{aligned}
  mH(N)m_H (N) can replace M with a few changes

Fun Time

For positive rays, mH(N)=N+1m_H (N) = N + 1. Plug it into the VC bound for ? = 0.1 and N = 10000. What is VC bound of BAD events?
P[hH s.t. E in (h)E out (h)&gt;ϵ]4mH(2N)exp(18ϵ2N) \mathbb{P}\left[\exists h \in \mathcal{H} \text { s.t. } | E_{\text { in }}(h)-E_{\text { out }}(h) |&gt;\epsilon\right] \leq 4 m_{\mathcal{H}}(2 N) \exp \left(-\frac{1}{8} \epsilon^{2} N\right)
1 2.77×10872.77 × 10^{−87}
2 5.54×10835.54 × 10^{−83}
3 2.98×1012.98 × 10^{−1}  \checkmark
4 2.29×1022.29 × 10^{−2}


Explanation
代入公式计算即可。
0.2981471603789822

Summary

本篇讲义主要讲了Bound FunctionB(N,k)B(N,k)以及VC Bound的含义及推导。


讲义总结


mH(N)m_{\mathcal{H}}(N)有break point,且NN足够大,那么EoutEinE_{\mathrm{out}} \approx E_{\mathrm{in}}.


Restriction of Break Point
  break point ‘breaks’ consequent points

Bounding Function: Basic Cases
  B(N,k)B(N,k) bounds mH(N)m_H (N) with break point k

Bounding Function: Inductive Cases
  B(N,k)B(N,k) is poly(N)

A Pictorial Proof
  mH(N)m_H (N) can replace M with a few changes

参考文献

《Machine Learning Foundations》(机器学习基石)—— Hsuan-Tien Lin (林轩田)

相关文章:

  • 2021-07-05
  • 2021-07-23
  • 2021-07-15
  • 2022-01-19
  • 2020-04-14
  • 2021-10-02
  • 2021-10-21
  • 2021-12-16
猜你喜欢
  • 2021-12-19
  • 2022-12-23
  • 2021-10-14
  • 2021-12-03
  • 2021-07-03
  • 2021-11-15
  • 2021-04-28
相关资源
相似解决方案