该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ
转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」
知乎:https://zhuanlan.zhihu.com/c_147249273
CSDN:http://blog.csdn.net/junjun_zhao/article/details/79040512
4.7 Parameters vs Hyperparameters (参数 VS 超参数)
(字幕来源:网易云课堂)
Being effective in developing your deep neural nets requires that,you not only organize your parameters well,but also your hyper parameters.so what are hyper parameters.let’s take a look so the parameters your model are
想要你的深度神经网络起很好的效果 ,你还需要规划好你的参数,以及超参数,那么啥是超参数呢,我们来看看模型里的参数
重点:
参数:
参数即是我们在过程中想要模型学习到的信息,
超参数:
超参数即为控制参数的输出值的一些网络信息,也就是超参数的改变会导致最终得到的参数 W^{[l]},b^{[l]} 的改变。
举例:
学习速率:
迭代次数:
隐藏层的层数:
每一层的神经元个数:
**函数
In fact deep learning has a lot of different hyper parameters,later in the later course we’ll see other hyper parameters as well,such as the momentum term the mini batch size,various forms of regularization parameters and so on,and if none of these terms at the bottom make sense yet,don’t worry about it. we’ll talk about them in the second course,because deep learning has so many hyper parameters in contrast to earlier eras of machine learning.I’m going to try to be very consistent in calling the learning rate alpha a hyper parameter rather than calling the parameter.I think in earlier eras of machine learning,when we didn’t have so many hyper parameters,most of us used to be a bit sloppy here,and just call alpha a parameter,and technically alpha is a parameter,but is a parameter that determines the real parameters,so try to be consistent in calling these things like alpha,the number of iterations and so on hyper parameters,so when you’re training a deep net for your own application,you find that there may be a lot of possible settings,for the hyper parameters,that you need to just try out.
实际上深度学习有很多不同的超参数,之后我们也会过一下其他的超参数,比如 momentum 再比如 mini batch 的大小,几种不同的正则化参数等等,如果这些词你还不太确定是什么意思,不要担心 我们会在课程2里提到的,正因为深度学习有这么多的超参数,和机器学习时代的早期相比,我会保持前后一致,把学习率
so applied deep learning today is a very empirical process,where often you might have an idea.for example you might have an idea for the best value for the learning rate,you might say well maybe alpha equals 0.01. I want to try that,then you implemented try it out and then see how that works,and then based on that outcome you might say,you know what I’ve changed online.I want to increase the learning rate to 0.05,and so if you’re not sure,what’s the best value for the learning ready-to-use,you might try one value of the learning rate alpha,and see their cost function J go down like this,then you might try a larger value for the learning rate alpha,and see the cost function blow up and diverge,then you might try another version,and see it go down really fast it’s converge higher value,you might try another version and see it,you know see the cost function J do that,off to try a set of values. you might say okay looks like this,the value of alpha gives me a pretty fast learning,and allows me to converge to a lower cost function J.I’m going to use this value of alpha.
今天的深度学习应用领域 还是很经验性的过程,通常你有个想法,比如你可能大致知道,一个最好的学习率值,可能说
you saw in a previous slide,that there are a lot of different hyperparameters,and it turns out that when you’re starting on the new application.I should find it very difficult to know in advance exactly,what’s the best value of the hyper parameters.so what often happen is you just have to try out many different values,and go around this cycle your trial some value,really try five hidden layers with this many number of hidden units,implement that see if it works and then iterate,so the title of this slide is that applied deep learning is very empirical process,and empirical process is maybe a fancy way of saying,you just have to try a lot of things and see what works,another effect I’ve seen is that deep learning today is applied to so many problems,ranging from computer vision to speech recognition,to natural language processing,to a lot of structured data applications,such as maybe a online advertising,or web search or product recommendations and so on,and what I’ve seen is that,first I’ve seen researchers from one discipline,any one of these try to go to a different one,and sometimes the intuitions about hyper parameters carries over,and sometimes it doesn’t.so I often advise people especially when starting on a new problem,to just try out a range of values and see what works.
在前面几页中,还有很多不同的超参数,然而 当你开始开发新应用时,预先很难确切知道,究竟超参数的最优值应该是什么,所以通常 你必须,尝试很多不同的值,并走这个循环 试试各种参数,试试看 5 个隐层 这个数目的隐藏单元,实现模型并观察是否成功 然后再迭代,这页的标题是 应用深度学习领域,一个很大程度基于经验的过程,凭经验的过程通俗来说,就是试 试 试 直到你找到合适的数值,另一个近来深度学习的影响是,它用于解决很多问题,从计算机视觉到语音识别,到自然语言处理,到很多结构化的数据应用,比如网络广告,或是网页搜索 或产品推荐等等,我所看到过的就有,很多其中一个领域的研究员,这些领域中的一个 尝试了不同的设置,有时候这种设置超参数的直觉 可以推广,但有时又不会,所以我经常建议人们 特别是刚开始应用于新问题的人们,去试一定范围的值看看结果如何。
and then the next course we’ll see a systematic way,we’ll see some systematic ways for trying out a range of values all right,and second even if you’re working on one application for a long time you know,maybe you’re working on online advertising,as you make progress on the problem.It is quite possible there the best value for the learning rate a number of hidden units and so on might change.so even if you tune your system to the best value of hyper parameters to daily as possible. you find that the best value might change a year from now,maybe because the computer infrastructure.I’d be it you know CPUs or the type of GPU,running on or something has changed but,so maybe one rule of thumb is you know every now,and then maybe every few months,if you’re working on a problem for an extended period of time for many years,just try a few values for the hyper parameters and double check,if there’s a better value for the hyper parameters and as you do.
然后下一门课程 我们会用更系统的方法,用系统性的尝试各种超参数取值。然后其次 甚至是你,已经用了很久的模型,可能你在做网络广告应用,在你开发途中,很有可能学习率的最优数值,或是其他超参数的最优值 是会变的,所以即使你每天都在用当前最优的参数调试你的系统,你还是会发现,最优值过一年就会变化,因为电脑的基础设施,CPU或是GPU,可能会变化很大,所以有一条经验规律,可能每几个月就会变,如果你所解决的问题 需要很多年时间,只要经常试试不同的超参数 勤于检验结果,看看有没有更好的超参数数值,
so you slowly gain intuition as well about the hyper parameters,that work best for your problems,and I know that this might seem like an unsatisfying part of deep learning,that you just have to try on all the values for these hyper parameters,but maybe this is one area,where deep learning research is still advancing,and maybe over time we’ll be able to give better guidance,for the best hyper parameters to use,but it’s also possible that because CPUs and GPUs and networks,and datasets are all changing,and it is possible that the guidance won’t to converge for some time,and you just need to keep trying out different values and evaluate them on a hold out cross-validation set or something,and pick the value that works for your problems.so that was a brief discussion of hyper parameters.in the second course we’ll also give some suggestions for how to systematically explore the space of hyper parameters,but by now you actually have pretty much all the tools you need to do their programming exercise,before you do that adjust or share view one more set of ideas,which is I often asked what does deep learning have to do the human brain.
相信你慢慢会得到设定超参数的直觉,知道你的问题最好用什么数值,可能的确是深度学习,比较让人不满的一部分,也就是你必须尝试很多次不同可能性,但参数设定这个领域,深度学习研究还在进步中。所以可能过段时间就会有更好的方法,决定超参数的值,也很有可能由于CPU GPU 网络,和数据都在变化,这样的指南可能只会在一段时间内起作用,只要你不断尝试 并且,尝试保留交叉检验或类似的检验方法,然后挑一个对你的问题效果比较好的数值,以上我们简短地讨论完了超参数,在课程2中我们会给更多具体的建议,关于如何系统化地探索超参数的可能空间,到现在你应该已经有了,完成这次编程作业的工具,在做练习之前我想要再分享一些想法,很多人经常问我 深度学习和人类大脑,有什么样的关联。
4.8 这和大脑有什么关系?
so what a deep learning have to do the brain at the risk of giving away the punchline.I would say not a whole lot,but let’s take a quick look at why people keep making the analogy between deep learning and the human brain,when you implement a neural network this is what you do,for prop and back prop,and I think because it’s been difficult to convey intuitions about what these equations are doing really,gradient descent on a very complex function,the analogy that is like the brain has become,really an oversimplified explanation for what this is doing,but the simplicity of this,makes it you know kind of seductive for people to just say it publicly,as well as the media to report it,and certainly caught the popular imagination,and there is a very loose analogy between.let’s say a logistic regression unit with a sigmoid activation function,and here’s a cartoon of a single neuron in the brain,in this picture of a biological neuron on this neuron,which is a cell in your brain,receives electric signals from you know other neurons x1 x2 x3,or maybe from other neurons a1 a2 a3,there’s a simple thresholded computation,and then if this neuron fires,it sends a pulse of electricity down the axon,down this long wire perhaps to other neurons,so there is a very simplistic analogy between a single logistic unit,between a single neuron and network and a biological neuron like that shown on a right.
那么深度学习和大脑有什么关联性吗?这句话可能有剧透嫌疑但是我觉得关联不大我们来看看为什么人们做这样的类比为什么说深度学习和人类大脑相关当你实现一个神经网络时 这是你在做的东西,你会做正反向传播,其实很难表述这些公式,具体做了什么 就是在做这些,复杂函数的梯度下降法 到底具体在做什么,而这样的类比其实过度简化了,我们的大脑具体在做什么,但因为这种形式很简洁,也让普通人更愿意公开讨论,也方便新闻媒体报道,并且吸引大众眼球,但这个类比还是很粗略的。这是一个logistic回归单元的sigmoid**函数,这里是一个大脑中的神经元,图中这个生物神经元,也是你大脑中的一个细胞,它能接受来自其他神经元的电信号 比如x1 x2 x3,或可能来自于其他神经元a1 a2 a3,其中有一个简单的临界计算值,如果这个神经元突然激发了,它会让电脉冲沿着这条长长的轴突,或者说一条导线 传到另一个神经元,所以这是一个过度简化的对比 把一个神经网络的逻辑单元,和右边的生物神经元对比,至今为止其实连神经科学家们都很难解释,究竟一个神经元能做什么,一个小小的神经元其实却是极其复杂的。
but I think that today even neuroscientists have almost no idea,what even a single neuron is doing a single neuron appears to be much more complex than we are able to characterize with neuroscience,and while some of what is doing is a little bit like logistic regression,there’s still a lot about what even a single neuron does that no one there,no human today understands,for example exactly how neurons in the human brain learn,this is still a very mysterious process,and it’s completely unclear today,whether the human brain uses an algorithm does anything,like back propagation or gradient descent,or if there’s some fundamentally different learning principle that the human brain uses,so when I think of deep learning I think of it as being very good,and learning very flexible functions very complex functions,to learn x to y mappings to learn input-output mappings in supervised learning,and whereas it is like the brain analogy,maybe that was useful once I think the field has moved to the point,where that analogy is breaking down,and I tend not to use that analogy much anymore,so that’s it so neural networks and the brain.I do think that maybe the field of computer vision,has taken a bit more inspiration from the human brains,and other disciplines that also apply to deep learning,but I personally use the analogy you know,to the human brain less than I used to,so that’s it for this video,you now know how to implement for prop and back prop,in gradient descent even for deep neural networks,best of luck with the programming exercise,and I look forward to sharing more of these ideas of you in the second course.
以至于我们无法在神经科学的角度描述清楚,它的一些功能 可能真的是类似 logistic 回归的运算,但单个神经元到底在做什么,目前还没有人能够真正解释,大脑中的神经元是怎么学习的,至今这仍是一个谜之过程,到底大脑,是用类似于后向传播,或是梯度下降的算法,或者人类大脑的学习过程用的是完全不同的原理,所以虽然深度学习的确是个很好的工具,能学习到各种很灵活很复杂的函数,来学到从 x 到 y 的映射,在监督学习中 学到输入到输出的映射,但这种和人类大脑的类比,在这个领域的早期 也许值得一提 但现在,这种类比已经逐渐过时了,我自己也在尽量少用这样的说法,这就是神经网络和大脑的关系。我相信在计算机视觉,或其他的学科都曾受人类大脑启发,还有其他深度学习的领域也曾受人类大脑启发,但是个人来讲我用这个人类大脑类比的次数,逐渐减少了,差不多结束了,现在你知道怎么实现深度神经网络里,梯度下降法的正反向传播了,祝你做编程练习的时候好运,我很期待在第二门课 和大家分享更多的知识。
重点总结:
参数:
参数即是我们在过程中想要模型学习到的信息,
超参数:
超参数即为控制参数的输出值的一些网络信息,也就是超参数的改变会导致最终得到的参数 W^{[l]},b^{[l]} 的改变。
举例:
学习速率:
迭代次数:
隐藏层的层数:
每一层的神经元个数:
**函数
参考文献:
[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-4)– 浅层神经网络
PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。