深入研究了一下陈天奇Boosted Tree的PPT,做了点简单的笔记,可以说是PPT的缩略版:
框架有了,截了些重要的图和公式。
虽然简略,但是足以学习大牛思考问题的方式。
Review of key concepts of supervised learning
- Elements in Supervised Learning
- Model
- Parameters
- Objective function
- Putting known knowledge into context
- Objective and Bias Variance Trade-off
Regression Tree and Ensemble (What are we Learning)
- Regression Tree (CART)
- Regression Tree Ensemble
- Tree Ensemble methods(基于树的集成方法的一些优点)
- Put into context: Model and Parameters(model:加法模型;parameters:树/函数)
- Learning a tree on single variable(情歌率-时间序列的例子)
- Learning a step function(阶跃函数)
- Learning step function (visually)
- Coming back: Objective for Tree Ensemble
- Objective vs Heuristic
- Regression Tree is not just for regression!
Gradient Boosting (How do we Learn)
- Take Home Message for this section (其实是总结第二部分。。)
- So How do we Learn? (SGD不可以,提出加法模型)
- Additive Training (公式推导,生成残差)
- Taylor Expansion Approximation of Loss (二级泰勒展开,出现gi和hi)
- Our New Goal (化简后的损失函数)
- Refine the definition of tree (树的数学表达)
- Define Complexity of a Tree (cont’) (定义树的复杂度)
- Revisit the Objectives (结合前两张,重看目标函数)
- The Structure Score
- The Structure Score Calculation
- Searching Algorithm for Single Tree
- Greedy Learning of the Tree
- Efficient Finding of the Best Split
- An Algorithm for Split Finding
- What about Categorical Variables?
- Pruning and Regularization
- Recap: Boosted Tree Algorithm (总结第三部分)