卷积神经网络

CNN一般由卷积层、池化层、全连接层三种类型的层构成。

卷积层

【cs231n】卷积神经网络及反向传播
一些概念:

  • 感受野:即每个神经元与输入数据的局部区域连接的空间大小,其大小为卷积核尺寸(F),深度永远与输入数据的深度相同。
  • 步长:卷积核在输入数据上滑动时,每步所移动的像素大小
  • 零填充:为了使得输出与输入保持同样的大小而仅仅在深度上不同,需要在输入数据四边进行零填充。

假设输入数据体大小:W1×H1×D1W _ { 1 } \times H _ { 1 } \times D _ { 1 },输出为W2×H2×D2W _ {2 } \times H _ { 2 } \times D _ { 2 }。并规定四个超参数:卷积核尺寸FF,卷积核数量KK,步长SS,零填充数量PP。则:

  • W2=(W1F+2P)/S+1W _ { 2 } = \left( W _ { 1 } - F + 2 P \right) / S + 1H2=(H1F+2P)/S+1H _ { 2 } = \left( H_ { 1 } - F + 2 P \right) / S + 1D2=KD_2=K
  • 该卷积层共有(FFD1+1)D2(F \cdot F \cdot D _ { 1 }+1) \cdot D_2个参数

用公式表示为:μij=p=1fq=1fxi+p1,j+q1×wpq+b\mu _ { i j } =\sum _ { p = 1 } ^ { f } \sum _ { q = 1 } ^ { f } x _ { i + p - 1 , j + q - 1 } \times w _ { p q } + b,其中pqpq为卷积核的角标。

池化层

【cs231n】卷积神经网络及反向传播
池化层的作用时降低数据体的空间大小,从而减小计算参数数量。池化过程仅仅改变数据体的大小,不改变其深度。
假设输入数据体大小:W1×H1×D1W _ { 1 } \times H _ { 1 } \times D _ { 1 },输出为W2×H2×D2W _ {2 } \times H _ { 2 } \times D _ { 2 }。并规定四个超参数:滤波器尺寸FF,步长SS。则:

  • W2=(W1F)/S+1W _ { 2 } = \left( W _ { 1 } - F \right) / S + 1H2=(H1F)/S+1H _ { 2 } = \left( H _ { 1 } - F \right) / S + 1D1=D2D_1=D_2

反向传播

卷积层反向传播

反向传播要计算损失函数对卷积核LW\frac { \partial L } { \partial W }、偏置Lb\frac { \partial L } { \partial b}以及输入图像的导数LX\frac { \partial L } { \partial X}

卷积层正向传播的公式为μij=p=1fq=1fxi+p1,j+q1×wpq+b\mu _ { i j } =\sum _ { p = 1 } ^ { f } \sum _ { q = 1 } ^ { f } x _ { i + p - 1 , j + q - 1 } \times w _ { p q } + b实例如下:[u11u12u21u22]=[x11x12x13x21x22x23x31x32x33][w11w12w21w22]+[bbbb]\left[ \begin{array} { l l } { u _ { 11 } } & { u _ { 12 } } \\ { u _ { 21 } } & { u _ { 22 } } \end{array} \right] = \left[ \begin{array} { l l l l } { x _ { 11 } } & { x _ { 12 } } & { x _ { 13 } } \\ { x _ { 21 } } & { x _ { 22 } } & { x _ { 23 } } \\ { x _ { 31 } } & { x _ { 32 } } & { x _ { 33 } } \end{array} \right] * \left[ \begin{array} { l l l } {w_ { 11 } } & { w _ { 12 } }\\ { w_ { 21 } } & { w _ { 22 } } \end{array} \right]+ \left[ \begin{array} { l l } { b } & { b } \\ { b } & { b } \end{array} \right]=[x11w11+x12w12+x21w21+x22w22+bx12w11+x13w12+x22w21+x23w22+bx21w11+x22w12+x31w21+x32w22+bx22w11+x23w12+x32w21+x33w22+b]=\left[ \begin{array} { l l } { x _ { 11 } w _ { 11 } + x _ { 12 } w _ { 12 } + x _ { 21 } w _ { 21 } + x _ { 22 } w _ { 22 } +b} & { x _ { 12 } w _ { 11 } + x _ { 13 } w _ { 12 } + x _ { 22 } w _ { 21 } + x _ { 23 } w _ { 22 }+b } \\ { x _ { 21 } w _ { 11 } + x_ { 22 } w _ { 12 } +x _ { 31 } w _ { 21 } + x_ { 32 } w _ { 22 }+b} & {x _ { 22 } w _ { 11 } + x _ { 23 } w _ { 12 } +x _ { 32 } w _ { 21 } + x _ { 33 } w _ { 22 }+b } \end{array} \right]

  • LW\frac { \partial L } { \partial W }
    由于卷积核要多次作用于图像的不同位置,所以卷积结果μ\mu中的每个元素的计算都由WW中的每个元素参与,因此损失函数对卷积核的导数可以表示为:Lwpq=ij(Luijuijwpq)\frac { \partial L } { \partial w _ { p q } } = \sum _ { i } \sum _ { j } \left( \frac { \partial L } { \partial u _ { i j } } \frac { \partial u _ { i j } } { \partial w _ { p q } } \right)
    uijwpq=(p=1fq=1fxi+p1,j+q1×wpq+b)wpq=xi+p1,j+q1\frac { \partial u _ { i j } } { \partial w _ { p q } }=\frac { \partial( \sum _ { p = 1 } ^ { f } \sum _ { q = 1 } ^ { f } x _ { i + p - 1 , j + q - 1 } \times w _ { p q } + b )} { \partial w _ { p q } } =x _ { i + p - 1 , j + q - 1 } Lwpq=ij(Luijxi+p1,j+q1)\frac { \partial L } { \partial w _ { p q } } = \sum _ { i } \sum _ { j } \left( \frac { \partial L } { \partial u _ { i j } } x _ { i + p - 1 , j + q - 1 } \right)可以看出Lw\frac { \partial L } { \partial w }的计算结果就是Lμ\frac { \partial L } { \partial \mu }XX上做卷积运算:[Lw11Lw12Lw21Lw22]=[x11x12x13x21x22x23x31x32x33][Lμ11Lμ12Lμ21Lμ22]\left[ \begin{array} { l l } { \frac { \partial L } { \partial w _ { 11 } } } & { \frac { \partial L } { \partial w _ { 12 } } } \\ { \frac { \partial L } { \partial w _ { 21 } } } & { \frac { \partial L } { \partial w _ {22 } } } \end{array} \right] = \left[ \begin{array} { l l l l } { x _ { 11 } } & { x _ { 12 } } & { x _ { 13 } } \\ { x _ { 21 } } & { x _ { 22 } } & { x _ { 23 } } \\ { x _ { 31 } } & { x _ { 32 } } & { x _ { 33 } } \end{array} \right] * \left[ \begin{array} { l l l } {\frac { \partial L } { \partial \mu }_ { 11 } } & { \frac { \partial L } { \partial \mu }_ { 12 } }\\ { \frac { \partial L } { \partial \mu }_ { 21 } } & { \frac { \partial L } { \partial \mu } _ { 22 } } \end{array} \right]=[x11Lμ11+x12Lμ12+x21Lμ21+x22Lμ22x12Lμ11+x13Lμ12+x22Lμ21+x23Lμ22x21Lμ11+x22Lμ12+x31Lμ21+x32Lμ22x22Lμ11+x23Lμ12+x32Lμ21+x33Lμ22]=\left[ \begin{array} { l l } { x _ { 11 } \frac { \partial L } { \partial \mu } _ { 11 } + x _ { 12 } \frac { \partial L } { \partial \mu } _ { 12 } + x _ { 21 } \frac { \partial L } { \partial \mu } _ { 21 } + x _ { 22 } \frac { \partial L } { \partial \mu } _ { 22 } } & { x _ { 12 } \frac { \partial L } { \partial \mu } _ { 11 } + x _ { 13 } \frac { \partial L } { \partial \mu } _ { 12 } + x _ { 22 } \frac { \partial L } { \partial \mu } _ { 21 } + x _ { 23 } \frac { \partial L } { \partial \mu } _ { 22 } } \\ { x _ { 21 } \frac { \partial L } { \partial \mu } _ { 11 } + x_ { 22 } \frac { \partial L } { \partial \mu } _ { 12 } +x _ { 31 } \frac { \partial L } { \partial \mu } _ { 21 } + x_ { 32 } \frac { \partial L } { \partial \mu } _ { 22 }} & {x _ { 22 } \frac { \partial L } { \partial \mu } _ { 11 } + x _ { 23 } \frac { \partial L } { \partial \mu } _ { 12 } +x _ { 32 } \frac { \partial L } { \partial \mu } _ { 21 } + x _ { 33 } \frac { \partial L } { \partial \mu } _ { 22 } } \end{array} \right] 将其转化为矩阵形式:LW=ij(LuijX[i:i+p,j:j+q])\frac { \partial L } { \partial W } = \sum _ { i } \sum _ { j } \left( \frac { \partial L } { \partial u _ { i j } } X _ { [i : i+p , j :j+ q ]} \right)
  • LX\frac { \partial L } { \partial X }
    以上述实例为例,卷积核在原图上滑动了4个位置,得到4个结果,μ\mu中的每个元素都是X中部分元素与卷积核中所有元素的乘积和,比如μ11=x11w11+x12w12+x21w21+x22w22+b\mu_{11}=x _ { 11 } w _ { 11 } + x _ { 12 } w _ { 12 } + x _ { 21 } w _ { 21 } + x _ { 22 } w _ { 22 } + b,则Lx11=Lμ11w11\frac { \partial L } { \partial x_{11} }=\frac { \partial L } { \partial \mu_{11} }w _ { 11 }Lx12=Lμ11w12\frac { \partial L } { \partial x _ { 12 } }=\frac { \partial L } { \partial \mu_{11}} w _ { 12 },将其类推到所有μ\mu中,得:Lμ11[w11w120w21w220000]+Lμ12[0w11w120w21w22000]+Lμ21[000w11w120w21w220]+Lμ22[0000w11w120w21w22]\frac { \partial L } { \partial \mu_{11} }\left[ \begin{array} { l l l l } { w _ { 11 } } & {w _ { 12 } } & { 0 } \\ {w _ { 21 } } & {w _ { 22 } } & { 0 } \\ { 0 } & { 0 } & { 0 } \end{array} \right] + \frac { \partial L } { \partial \mu_{12} } \left[ \begin{array} { l l l l } { 0} & {w _ { 11 } } & {w _ { 12 } } \\ {0 } & {w _ { 21 } } & { w _ { 22} } \\ { 0 } & { 0 } & { 0 } \end{array} \right]+ \frac { \partial L } { \partial \mu_{21} } \left[ \begin{array} { l l l l } { 0} & {0 } & {0 } \\ {w _ { 11 } } & {w _ { 12 } } & { 0 } \\ { w _ { 21 } } & {w _ { 22} } & { 0 } \end{array} \right]+ \frac { \partial L } { \partial \mu_{22} } \left[ \begin{array} { l l l l } { 0} & {0 } & {0 } \\ {0} & {w _ { 11 } } & { w _ { 12 } } \\ {0 } & {w _ { 21} } & { w _ { 22 } } \end{array} \right]上式即为求的LX\frac { \partial L } { \partial X },推广到一般形式为: LX=ij[Lμij(Xi:i+p,j:j+q=W)]\frac { \partial L } { \partial X }=\sum _ { i } \sum _ { j } \left[\frac { \partial L } { \partial \mu_{ij} }(X_{i:i+p,j:j+q}=W)\right]
    实际上,LX\frac { \partial L } { \partial X }Lμ\frac { \partial L } { \partial \mu }间也存在卷积关系:[Lx11Lx12Lx13Lx21Lx22Lx23Lx31Lx32Lx33]=[00000Lμ11Lμ1200Lμ21Lμ2200000][w22w21w12w11]\left[ \begin{array} { l l l l } { \frac { \partial L } { \partial x_{11} } } & { \frac { \partial L } { \partial x_{12} }} & { \frac { \partial L } { \partial x_{13} } } \\ {\frac { \partial L } { \partial x_{21} } } & { \frac { \partial L } { \partial x_{22} }} & { \frac { \partial L } { \partial x_{23} } } \\ { \frac { \partial L } { \partial x_{31} } } & { \frac { \partial L } { \partial x_{32} } } & { \frac { \partial L } { \partial x_{33} }} \end{array} \right]=\left[ \begin{array} { c c c c } { 0 } & { 0 } & { 0 } & { 0 } \\ { 0 } & { \frac { \partial L } { \partial \mu_{11} }} & { \frac { \partial L } { \partial \mu_{12} }} & { 0 } \\ { 0 } & { \frac { \partial L } { \partial \mu_{21} }} & { \frac { \partial L } { \partial \mu_{22} } } & { 0 } \\ { 0 } & { 0 } & { 0 } & { 0 } \end{array} \right]* \left[\begin{array} { l l } { w _ { 22 } } & { w _ { 21 } } \\ { w _ { 12 } } & { w _ { 11 } } \end{array} \right]首先在Lμ\frac { \partial L } { \partial \mu}周围进行pad操作以调整大小,然后将W旋转180度,最后进行卷积操作即为LX\frac { \partial L } { \partial X}
  • Lb\frac { \partial L } { \partial b }
    Lb=ijLuij\frac { \partial L } { \partial b }= \sum _ { i } \sum _ { j }\frac { \partial L } { \partial u _ { i j } }

代码:

def conv_forward_naive(x, w, b, conv_param):
    """
    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
      - 'stride': The number of pixels between adjacent receptive fields in the
        horizontal and vertical directions.
      - 'pad': The number of pixels that will be used to zero-pad the input.
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
      H' = 1 + (H + 2 * pad - HH) / stride
      W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    stride,pad = conv_param['stride'],conv_param['pad']
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    HHH = 1 + (H + 2 * pad - HH) // stride
    WWW = 1 + (W + 2 * pad - WW) // stride
    out = np.zeros((N,F, HHH, WWW))
    xpad = np.pad(x, ((0,0),(0,0),(pad,pad),(pad,pad)) ,mode='constant',constant_values=0)
    for f in range(F):
        for j in range(HHH):
            for k in range(WWW):
                out[:,f,j,k] = np.sum(xpad[:, :, j*stride:j*stride+HH, k*stride:k*stride+WW]*w[f],axis=(1,2,3))+b[f]
    cache = (x, w, b, conv_param)
    return out, cache


def conv_backward_naive(dout, cache):
    """
    Inputs:
    - dout: Upstream derivatives.
    - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive
    Returns a tuple of:
    - dx: Gradient with respect to x
    - dw: Gradient with respect to w
    - db: Gradient with respect to b
    """
    dx, dw, db = None, None, None
    x, w, b, conv_param = cache
    stride, pad = conv_param['stride'], conv_param['pad']
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    HHH = 1 + (H + 2 * pad - HH) // stride
    WWW = 1 + (W + 2 * pad - WW) // stride
    xpad = np.pad(x, ((0,0),(0,0),(pad,pad),(pad,pad)) ,mode='constant',constant_values=0)
    dxpad = np.zeros_like(xpad)
    dw = np.zeros_like(w)
    db = np.zeros_like(b)
    dx = np.zeros_like(x)
    for n in range(N):
        for f in range(F):
            for j in range(HHH):
                for k in range(WWW):
                    db[f] += dout[n,f,j,k]
                    dw[f] += xpad[n, :, j*stride:j*stride+HH, k*stride:k*stride+WW]*dout[n,f,j,k]
                    dxpad[n, :, j*stride:j*stride+HH, k*stride:k*stride+WW] += w[f]*dout[n,f,j,k]

    dx = dxpad[:,:,pad:pad+H, pad:pad+W]
    return dx, dw, db

池化层反向传播

假设,Lμ=[1234]\frac { \partial L } { \partial \mu }=\left[ \begin{array} { l l } { 1 } & { 2 } \\ { 3 } & { 4 } \end{array} \right]
首先,将其恢复为X的大小[1122112233443344]\left[ \begin{array} { l l l l } { 1 } & { 1 } & { 2 } & { 2 } \\ { 1 } & { 1 } & { 2 } & { 2 } \\ { 3 } & { 3 } & { 4 } & { 4 } \\ { 3 } & { 3 } & { 4 } & { 4 } \end{array} \right]
若为最大池化,则将X中最大的元素保持,其余置0:LX=[1000000200400300]\frac { \partial L } { \partial X }=\left[ \begin{array} { l l l l } { 1 } & { 0 } & { 0 } & { 0} \\ { 0 } & {0 } & { 0 } & { 2 } \\ { 0} & { 0 } & { 4 } & { 0 } \\ { 0} & { 3 } & { 0 } & { 0 } \end{array} \right]
若为平均池化,则求平均值填入对应位置:LX=[0.250.250.50.50.250.250.50.50.750.75110.750.7511]\frac { \partial L } { \partial X }=\left[ \begin{array} { l l l l } { 0.25 } & { 0.25 } & { 0.5 } & { 0.5} \\ { 0 .25} & {0 .25} & { 0.5} & { 0.5 } \\ { 0.75} & { 0 .75} & { 1 } & { 1 } \\ { 0.75} & {0.75 } & { 1 } & { 1 } \end{array} \right]

def max_pool_forward_naive(x, pool_param):
    """
    Inputs:
    - x: Input data, of shape (N, C, H, W)
    - pool_param: dictionary with the following keys:
      - 'pool_height': The height of each pooling region
      - 'pool_width': The width of each pooling region
      - 'stride': The distance between adjacent pooling regions
    Returns a tuple of:
    - out: Output data, of shape (N, C, H', W') where H' and W' are given by
      H' = 1 + (H - pool_height) / stride
      W' = 1 + (W - pool_width) / stride
    - cache: (x, pool_param)
    """
    out = None
    N, C, H, W = x.shape
    HH,WW,stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
    HHH = 1 + (H - HH) // stride
    WWW = 1 + (W - WW) // stride
    out = np.zeros((N,C,HHH,WWW))
    for j in range(HHH):
        for k in range(WWW):
            out[:,:,j,k] = np.max(x[:,:,j*stride:j*stride+HH,k*stride:k*stride+WW],axis=(2,3))
    cache = (x, pool_param)
    return out, cache


def max_pool_backward_naive(dout, cache):
    """
    Inputs:
    - dout: Upstream derivatives
    - cache: A tuple of (x, pool_param) as in the forward pass.
    Returns:
    - dx: Gradient with respect to x
    """
    dx = None
    x, pool_param = cache
    N, C, H, W = x.shape
    HH,WW,stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
    HHH = 1 + (H - HH) // stride
    WWW = 1 + (W - WW) // stride
    dx = np.zeros_like(x)
    for j in range(HHH):
        for k in range(WWW):
            square = x[:,:,j*stride:j*stride+HH,k*stride:k*stride+WW]
            dx[:,:,j*stride:j*stride+HH,k*stride:k*stride+WW] = (square == np.max(square,axis=(2, 3), keepdims=True)) * dout[:,:,j,k][:, :, None, None]
    return dx

参考
https://www.jianshu.com/p/8ad58a170fd9
https://www.cnblogs.com/pinard/p/6494810.html
https://zhuanlan.zhihu.com/p/22038289?refer=intelligentunit

相关文章: