Course 4
Average Error on testing data
Two main errors:
Error due to bias
Error due to variance
Estimator
Only god knows about the best function
and we can only get a function from training data called
we say:
and the difference between f* and fhat comes from bias and variance
Bias and Variance of Estimator
Suppose the mean of a variable is μ, and the variance of x is σ2
m is a biased estimator of μ, s2 is a biased estimator of σ2
which shows how much m deviates μ and the variance depends on the amount of sample
and the relationship of these parameters is below:
Simple model with small variance, and complicated model with large variance since simpler model is less likely to be influenced by the sampled data.
Diagnosis
- If your model cannot even fit the training data, then you got a large bias, it is Underfitting
- If you can fit training data but got large error on testing data, then you probably got a large variance, it is Overfitting
For bias, redesign your model:
- Add more features as input
- A more complex model maybe needed
For large variance:
- More data is needed(Very effective but not always practical)
- Regularization
Model Selection
There is usually a trade-off between bias and variance
Select a model that balances two kinds of error to minimize total error
Cross Validation could a possible way to make balance:
And an advanced method called N-fold Cross Validation can be used