• a loss function tells how good our current classifier is
    You tell your algorithm what kind of errors you care about and what kind of errors you trade off against

  • Multi-class SVM loss
    -02/17/2020 Stanford- CS231-note Loss functions and optimization- j could be the number of classes our dataset have
    -syi - the score of the true class/ s- predicted scores come out from prediction
    -if true score is not high enough to be greater than any of the other scores, incur some loss
    -why 1 here? we only care about the relative differences between the scores,you will find 1 doesn’t matter if you rescale w, the free parameter of 1 washes out and is canceled with this overall scale in w
    -hinge loss (according to shape)
    02/17/2020 Stanford- CS231-note Loss functions and optimization

  • ex (include all bad predictions)
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    02/17/2020 Stanford- CS231-note Loss functions and optimization

  • Q: at initialization W is small so all s=.0, what is the loss?
    A: number of classes minus one (useful for debug)
    what if the sum was over all classes?

  • ????????????
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    lambda: regularization hyper-parameter is what we need to tune when training
    penalize the complexity of the model/ the complexity count on your decision (L1 cares about 0

  • Regularization
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    L2 will prefer w1 because it has a smaller norm/ like spread across all the values
    for L1, w1=w2/ L1 prefers sparse solutions, let many of elements to 0

  • softmax classifier (multinomial logistic regression)
    -why log, we hope our probabilty reach to 1
    -our loss is this minus log of probabilty of the true class
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    ex. 02/17/2020 Stanford- CS231-note Loss functions and optimization

  • Q: at initialization W is small so all s=.0, what is the loss?
    lg©

  • Opitimization
    how to find the bottom of valley

  • bad idea: random search, only 15% accuracy

  • follow the slope- derivative of a function( for scalar)
    -in multiple dimensions, for a vector of partial derivatives- greadient
    the slope in any direction is the dot product of the direction with the gradient (the direction of steepest descent is the negative gradient)
    -numerical gradient: slow, easy to write, approximate
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    -analytic gradient: exact, fast, error-prone
    calculate dw
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    gradient check: debugging tool (unique)

  • step_size = learning rate (first thing that tries to set)
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    minibatch:(update w) sample some random minibatch of data

  • image features
    -take your image and compute various feature representations- then concatenate these different feature vectors to give some fature representations of the image- feed them into a linear classifier
    -motivation
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    02/17/2020 Stanford- CS231-note Loss functions and optimization
    02/17/2020 Stanford- CS231-note Loss functions and optimization

相关文章: