该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂


转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/JUNJUN_ZHAO/article/details/78996315


3.5 Explanation for Vectorized implementation (向量化实现的解释)

(字幕来源:网易云课堂)

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

In the previous video we saw how with your training examples,stacked up horizontally in the matrix X.You can derive a vectorized implementation of forward propagation through your network.Let’s give a bit more justification for Why the equations we wrote down is a correct implementation of vectorizing across multiple examples.So let’s go through part of the forward propagation calculation for a few examples .Let’s say that for the first training example .You end up computing this W[1]x(1)+b1[1].And then for the second training example .You end up computing this .. x(2) plus b[1].And then for the third training example .You end up computing this 3 plus b[1].So just to simplify the explanation on this slide.I’m going to ignore b.

在以前的视频中 我们看到如何将训练样本,横向堆叠起来构成矩阵X,你就可以导出一个在网络中正向传播算法的,向量化实现,这里我们讲一下更多的理由 说明,为什么我们写下的方程,向量化在多样本时的正确实现,我们对几个样本手动算算正向传播,我们看第一个训练样本,你最后计算出这个W[1]x(1)+b[1],然后是第二个训练样本,你最后计算出这个.. x(2)+b[1],然后是第三个训练样本,你最后计算的是这个.. x(3)+b[1],所以为了简化幻灯片上的描述,我要忽略 b 。

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

So let’s just say you know for the to simplify this justification a little bit ,that B is equal to 0, so the argument going to layout will work with just a little bit of a change even when B is nonzero.It does just simplify the description on this slide of it.Now w1 is going to be some matrix right.So I have some number of rows in this matrix.So if you look at this calculation x(1).What you have is that W[1] times x(1) gives you some column vector,which you must draw a light ball like this.And similarly if you look at this vector x(2).You have that W[1] times x(2) gives some other column vector, right.

我们看看,比如说为了简化这个推导一点点,令 b=0 所以参数只需要变化一点点,就可以处理 b 非零的情况,它只是简化了这张幻灯片的描述,现在W[1]现在是个矩阵,这个矩阵里有一定数目的行,所以你看这个x(1)的计算,你这里得到的是W[1]乘以x(1) 得到一些列向量,我这里用这样的小点表示,同样 你观察一下向量x(2),这里有W[1]乘以x(2)得到其他一些列向量,对吧。

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

And that’s what gives you this, I guess z[1](2),and finally if you look at x(3).You have W[1]times x(3) gives you some third column vector, that’s this z[1](3).So now if you consider the training set capital X,which we form by stacking together all of our training examples.So the matrix capital X is formed by taking the vector x(1),and stacking it vertically with x(2) and then also x(3).This is a we have only three training examples.If you have more you know they’ll be a little keep stacking horizontally like that,but if you now take this matrix X and multiply it by W,then you end up with.If you think about how matrix multiplication works,you end up with the first column being these same values,that had drawn up there in purple.

然后我给你这个 我想是 z[1](2),最后你看x(3),你有W[1]x(3)得到第三个列向量 就是z[1](3),现在如果你考虑训练集 X,我们将所有训练样本堆叠起来得到的,所以矩阵大写X是把向量x(1)拿过来,横向叠上x(2) 然后用x(3),就是我们只有三个训练样本的情况,如果有更多的样本 你只要继续横向叠上去,但如果你现在取这个矩阵 X 然后让它乘以 W,最后你会得到,如果你想想矩阵乘法是怎么做的话,你的第一列还是这些一样的值,这些用紫色画出来的

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

The second column will be those same four values,and the third column will be those are orange values what they turn out to be,but of course this is just equal to z[1](1) expressed as a column vector,followed by z[1](2) express as a column vector,followed by z[1](3) also express as a column vector,and this is featuring examples if you have more examples,and they’ll be more columns,and so this is just our matrix capital Z[1].So I hope this gives a justification to why when we had previously,W[1] times x(i) equals z[1](i),when we’re looking at single training example at a time,when you took the different training examples,and stack them up in different columns.Then the corresponding result is that,you end up with the z-s stacked as different columns,and I won’t show but you can convince yourself,if you want stats with Python broadcasting,if you add back in these values of b.

第二列就是那同样的四个值,第三列是这些橙色的值,但当然了 这就等于将z[1](1)写成列向量,然后是列向量表示的z[1](2),然后是列向量表示的z[1](3),这些是特征样本 如果你有更多样本,那么列数会更多,所以这就是我们的矩阵大写Z[1],我希望能让你们弄清楚 为什么我们之前要写成,W[1]x(i)等于z[1](i)这个形式,那是针对单个训练样本的公式,当你处理不同训练样本时,就将它们堆到各列中,那么对应的结果应该是这样的,你最后会得到这些 z 叠起来 放在不同的列里,我不会写出具体形式 但你可以自己验证,如果你想用 Python 广播做矩阵和向量的加法,如果你把这些 b 值加回来 。

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

The values are still correct and what actually ends up happening is.You end up with Python broadcasting.You end up having b^{[i]} individually to each of the columns of this matrix.So on this slide I’ve only justified that,Z^{[1]}equals W^{[1]}X plus b^{[1]}.That’s a correct vectorization of the first step of the four steps.We have in the previous slide,but it turns out that the similar analysis,allows you to show that the other steps also work on,using a very similar logic where if you stack the inputs in columns.Then after the equation you get the corresponding outputs also stacked up in columns.

这些值还是对 但最后结果是,你最后用到 Python广播,你最后将b[i]单独加到矩阵各列,所以在这张幻灯片中 我只说明了,为什么Z[1]等于W[1]Xb[1],这是四步中第一步的正确向量化实现,就是上一张幻灯片那四步,但事实证明 类似的分析,让你发现其他步骤,也可以使用非常相似的逻辑 如果将输入成列向量堆叠,那么在方程运算之后 你也能得到成列堆叠的输出。

Finally let’s just recap everything we talked about in this video.This is your neural network.We said that this is what you need to do if you were to implement,forward propagation one training example at a time,going from i equals 1 through m.And we said let’s stack up the training examples in columns like so.And so each of these values z[1]a[1]z[2]a[2].The stack of the corresponding columns as follows.So this example for A^{[1]}but this is true for Z[1]A[1]Z[2] and A[2].Then what we showed on the previous slide was that.This line allows you to vectorize this across all m examples at the same time.And it turns out with the similar reasoning,you can show that all of the other lines,are correct vectorization of all four of these lines of code.

最后我们回顾一下这段视频的内容,这是你的神经网络,我说这就是如果你需要,在单个训练样本中实现正向传播算法的话 就要这么做,就是从i从1到m遍历,然后我说把这些训练样本以列向量堆叠起来,所以这里面每一个值 z[1]a[1]z[2]a[2],对应各列堆叠起来是这样的,这对A[1]成立 对Z[1]A[1]Z[2]A[2]都成立,现在我们上一张幻灯片中展示的是,这一行能让你对所有 m 个例子同时向量化,事实证明 使用类似的推导,你可以证明所有其他行,都是这四行代码的正确向量化形式 。

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

And just as a reminder, because X is also equal to A^{[0]}.Because you remember that the input feature vector x,was equal to a[0], so x(i) equals a[0](i).I then there’s actually a certain symmetry to these equations,where this first equation can also be written Z[1] equals W[1]A[0] plus b[1].And so you see that this pair of equations,and this pair of equations actually look very similar,but just of all that the indices advance by one.So this kind of shows that the different layers of a neural network,are you know roughly doing the same thing,or just doing the same computation over and over,and here we have a two layer neural network,where we go to a much deeper neural network in next week’s videos,you see that even deeper in your networks,are basically taking these two steps,and just doing them even more times than you’re seeing here.So that’s how you can vectorize your neural network across multiple training examples.Next, we’ve so far been using the sigmoid function throughout that neural network,turns out that’s actually not the best choice.On the next video let’s delve a little bit further into,how you can use different what’s called activation functions,of which the sigmoid function is just one possible choice.

这里提醒一下 因为X也等于A[0],因为你还记得输入的特征向量x,是等于a[0]的 所以x(i)等于a[0](i),其实这些方程有一定对称性,其中第一个方程也可以写成 Z[1] 等于W[1]A[0]b[1],你看这对方程,还有这对方程形式其实很类似,只不过这里所有指标加了 1,所以这样就显示出神经网络的不同层次,你知道大概每一步做的都是一样的,或者只不过同样的计算不断重复而已,这里我们有一个双层神经网络,我们在下周视频里会讲深得多的神经网络,你看到随着网络的深度变大,基本上也还是重复这两步运算,只不过重复次数更多 而这里你看到的是,所以这就是对不同训练样本向量化的神经网络,接下来 到目前为止 我们一直用的是σ函数,事实证明 这不是最好的选择,在下一个视频中 我们进一步深入研究,如何使用不同种类的**函数,其中σ函数只是其中一个可能选择。

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释


重点总结:

向量化实现

假定在 m 个训练样本的神经网络中,计算神经网络的输出,用向量化的方法去实现可以避免在程序中使用 for 循环,提高计算的速度。

下面是实现向量化的解释:

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

由图可以看出,在 m 个训练样本中,每次计算都是在重复相同的过程,均得到同样大小和结构的输出,所以利用向量化的思想将单个样本合并到一个矩阵中,其大小为(xn,m),其中 xn 表示每个样本输入网络的神经元个数,也可以认为是单个样本的特征数,m 表示训练样本的个数。

通过向量化,可以更加便捷快速地实现神经网络的计算。

参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-3)– 浅层神经网络


PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

Coursera | Andrew Ng (01-week-3-3.5)—向量化实现的解释

相关文章: