Evaluating learning algorithm

Evaluating a Hypothesis

Once we have done some trouble shooting for errors in our predictions by:

  • Getting more training examples:Fixes high variance
  • Trying smaller sets of features:Fixes high variance
  • Trying additional features:Fixes high bias
  • Trying polynomial features:Fixes high bias
  • Increasing λ:Fixes high variance
  • decreasing λ:Fixes high bias

可能有的公式针对训练集已经有很低的错误了,但是依然不够准确,因为这是过拟合的情况。所以为了分析假说公式,我们把数据集分为两类:训练集(70%)和测试集(30%)

Coursera吴恩达机器学习week3笔记

Model Selection and Train/Validation/Test Sets

One way to break down our dataset into the three sets is:

  • Training set: 60%
  • Cross validation set: 20%
  • Test set: 20%

We can now calculate three separate error values for the three different sets using the following method:

  1. Optimize the parameters in Θ using the training set for each polynomial degree.
  2. Find the polynomial degree d with the least error using the cross validation set.
  3. Estimate the generalization error using the test set with J , (d = theta from polynomial with lower error);

This way, the degree of the polynomial d has not been trained using the test set.

Bias vs Variance

Dignosing bisa vs variance

Coursera吴恩达机器学习week3笔记

Regularization and Bias/Variance

Coursera吴恩达机器学习week3笔记

Learning Curve

Coursera吴恩达机器学习week3笔记

Coursera吴恩达机器学习week3笔记

Diagnosing Neural Networks

  • A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.
  • A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Building a Spam Classifier

Prioritizing What to Work On

  • Collect lots of data (for example “honeypot” project but doesn’t always work)
  • Develop sophisticated features (for example: using email header data in spam emails)
  • Develop algorithms to process your input in different ways (recognizing misspellings in spam).

Coursera吴恩达机器学习week3笔记

Error Analysis

  • Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
  • Plot learning curves to decide if more data, more features, etc. are likely to help.
  • Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.

Handling Skewed Data

Coursera吴恩达机器学习week3笔记

F1 Score: 2*P*R/(P+R)

相关文章:

  • 2021-04-04
  • 2021-11-10
  • 2021-06-21
  • 2021-10-01
  • 2021-04-02
  • 2021-04-16
  • 2021-04-09
  • 2021-05-08
猜你喜欢
  • 2021-09-16
  • 2022-01-05
  • 2022-12-23
  • 2021-12-23
  • 2021-12-28
  • 2021-07-03
  • 2021-09-05
相关资源
相似解决方案