-
a loss function tells how good our current classifier is
You tell your algorithm what kind of errors you care about and what kind of errors you trade off against -
Multi-class SVM loss
-- j could be the number of classes our dataset have
-syi - the score of the true class/ s- predicted scores come out from prediction
-if true score is not high enough to be greater than any of the other scores, incur some loss
-why 1 here? we only care about the relative differences between the scores,you will find 1 doesn’t matter if you rescale w, the free parameter of 1 washes out and is canceled with this overall scale in w
-hinge loss (according to shape) -
ex (include all bad predictions)
-
Q: at initialization W is small so all s=.0, what is the loss?
A: number of classes minus one (useful for debug)
what if the sum was over all classes? -
????????????
lambda: regularization hyper-parameter is what we need to tune when training
penalize the complexity of the model/ the complexity count on your decision (L1 cares about 0 -
Regularization
L2 will prefer w1 because it has a smaller norm/ like spread across all the values
for L1, w1=w2/ L1 prefers sparse solutions, let many of elements to 0 -
softmax classifier (multinomial logistic regression)
-why log, we hope our probabilty reach to 1
-our loss is this minus log of probabilty of the true class
ex. -
Q: at initialization W is small so all s=.0, what is the loss?
lg© -
Opitimization
how to find the bottom of valley -
bad idea: random search, only 15% accuracy
-
follow the slope- derivative of a function( for scalar)
-in multiple dimensions, for a vector of partial derivatives- greadient
the slope in any direction is the dot product of the direction with the gradient (the direction of steepest descent is the negative gradient)
-numerical gradient: slow, easy to write, approximate
-analytic gradient: exact, fast, error-prone
calculate dw
gradient check: debugging tool (unique) -
step_size = learning rate (first thing that tries to set)
minibatch:(update w) sample some random minibatch of data -
image features
-take your image and compute various feature representations- then concatenate these different feature vectors to give some fature representations of the image- feed them into a linear classifier
-motivation
相关文章: