卷积神经网络
CNN一般由卷积层、池化层、全连接层三种类型的层构成。
卷积层

一些概念:
- 感受野:即每个神经元与输入数据的局部区域连接的空间大小,其大小为卷积核尺寸(F),深度永远与输入数据的深度相同。
- 步长:卷积核在输入数据上滑动时,每步所移动的像素大小
- 零填充:为了使得输出与输入保持同样的大小而仅仅在深度上不同,需要在输入数据四边进行零填充。
假设输入数据体大小:W1×H1×D1,输出为W2×H2×D2。并规定四个超参数:卷积核尺寸F,卷积核数量K,步长S,零填充数量P。则:
-
W2=(W1−F+2P)/S+1、 H2=(H1−F+2P)/S+1、D2=K
- 该卷积层共有(F⋅F⋅D1+1)⋅D2个参数
用公式表示为:μij=∑p=1f∑q=1fxi+p−1,j+q−1×wpq+b,其中pq为卷积核的角标。
池化层

池化层的作用时降低数据体的空间大小,从而减小计算参数数量。池化过程仅仅改变数据体的大小,不改变其深度。
假设输入数据体大小:W1×H1×D1,输出为W2×H2×D2。并规定四个超参数:滤波器尺寸F,步长S。则:
-
W2=(W1−F)/S+1、H2=(H1−F)/S+1、D1=D2
反向传播
卷积层反向传播
反向传播要计算损失函数对卷积核∂W∂L、偏置∂b∂L以及输入图像的导数∂X∂L。
卷积层正向传播的公式为μij=p=1∑fq=1∑fxi+p−1,j+q−1×wpq+b实例如下:[u11u21u12u22]=⎣⎡x11x21x31x12x22x32x13x23x33⎦⎤∗[w11w21w12w22]+[bbbb]=[x11w11+x12w12+x21w21+x22w22+bx21w11+x22w12+x31w21+x32w22+bx12w11+x13w12+x22w21+x23w22+bx22w11+x23w12+x32w21+x33w22+b]
-
∂W∂L
由于卷积核要多次作用于图像的不同位置,所以卷积结果μ中的每个元素的计算都由W中的每个元素参与,因此损失函数对卷积核的导数可以表示为:∂wpq∂L=∑i∑j(∂uij∂L∂wpq∂uij)
∂wpq∂uij=∂wpq∂(∑p=1f∑q=1fxi+p−1,j+q−1×wpq+b)=xi+p−1,j+q−1∂wpq∂L=i∑j∑(∂uij∂Lxi+p−1,j+q−1)可以看出∂w∂L的计算结果就是∂μ∂L在X上做卷积运算:[∂w11∂L∂w21∂L∂w12∂L∂w22∂L]=⎣⎡x11x21x31x12x22x32x13x23x33⎦⎤∗[∂μ∂L11∂μ∂L21∂μ∂L12∂μ∂L22]=[x11∂μ∂L11+x12∂μ∂L12+x21∂μ∂L21+x22∂μ∂L22x21∂μ∂L11+x22∂μ∂L12+x31∂μ∂L21+x32∂μ∂L22x12∂μ∂L11+x13∂μ∂L12+x22∂μ∂L21+x23∂μ∂L22x22∂μ∂L11+x23∂μ∂L12+x32∂μ∂L21+x33∂μ∂L22]将其转化为矩阵形式:∂W∂L=i∑j∑(∂uij∂LX[i:i+p,j:j+q])
-
∂X∂L
以上述实例为例,卷积核在原图上滑动了4个位置,得到4个结果,μ中的每个元素都是X中部分元素与卷积核中所有元素的乘积和,比如μ11=x11w11+x12w12+x21w21+x22w22+b,则∂x11∂L=∂μ11∂Lw11、∂x12∂L=∂μ11∂Lw12,将其类推到所有μ中,得:∂μ11∂L⎣⎡w11w210w12w220000⎦⎤+∂μ12∂L⎣⎡000w11w210w12w220⎦⎤+∂μ21∂L⎣⎡0w11w210w12w22000⎦⎤+∂μ22∂L⎣⎡0000w11w210w12w22⎦⎤上式即为求的∂X∂L,推广到一般形式为: ∂X∂L=i∑j∑[∂μij∂L(Xi:i+p,j:j+q=W)]
实际上,∂X∂L与∂μ∂L间也存在卷积关系:⎣⎡∂x11∂L∂x21∂L∂x31∂L∂x12∂L∂x22∂L∂x32∂L∂x13∂L∂x23∂L∂x33∂L⎦⎤=⎣⎢⎢⎡00000∂μ11∂L∂μ21∂L00∂μ12∂L∂μ22∂L00000⎦⎥⎥⎤∗[w22w12w21w11]首先在∂μ∂L周围进行pad操作以调整大小,然后将W旋转180度,最后进行卷积操作即为∂X∂L
-
∂b∂L
∂b∂L=i∑j∑∂uij∂L
代码:
def conv_forward_naive(x, w, b, conv_param):
"""
Input:
- x: Input data of shape (N, C, H, W)
- w: Filter weights of shape (F, C, HH, WW)
- b: Biases, of shape (F,)
- conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields in the
horizontal and vertical directions.
- 'pad': The number of pixels that will be used to zero-pad the input.
- out: Output data, of shape (N, F, H', W') where H' and W' are given by
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
- cache: (x, w, b, conv_param)
"""
out = None
stride,pad = conv_param['stride'],conv_param['pad']
N, C, H, W = x.shape
F, C, HH, WW = w.shape
HHH = 1 + (H + 2 * pad - HH) // stride
WWW = 1 + (W + 2 * pad - WW) // stride
out = np.zeros((N,F, HHH, WWW))
xpad = np.pad(x, ((0,0),(0,0),(pad,pad),(pad,pad)) ,mode='constant',constant_values=0)
for f in range(F):
for j in range(HHH):
for k in range(WWW):
out[:,f,j,k] = np.sum(xpad[:, :, j*stride:j*stride+HH, k*stride:k*stride+WW]*w[f],axis=(1,2,3))+b[f]
cache = (x, w, b, conv_param)
return out, cache
def conv_backward_naive(dout, cache):
"""
Inputs:
- dout: Upstream derivatives.
- cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive
Returns a tuple of:
- dx: Gradient with respect to x
- dw: Gradient with respect to w
- db: Gradient with respect to b
"""
dx, dw, db = None, None, None
x, w, b, conv_param = cache
stride, pad = conv_param['stride'], conv_param['pad']
N, C, H, W = x.shape
F, C, HH, WW = w.shape
HHH = 1 + (H + 2 * pad - HH) // stride
WWW = 1 + (W + 2 * pad - WW) // stride
xpad = np.pad(x, ((0,0),(0,0),(pad,pad),(pad,pad)) ,mode='constant',constant_values=0)
dxpad = np.zeros_like(xpad)
dw = np.zeros_like(w)
db = np.zeros_like(b)
dx = np.zeros_like(x)
for n in range(N):
for f in range(F):
for j in range(HHH):
for k in range(WWW):
db[f] += dout[n,f,j,k]
dw[f] += xpad[n, :, j*stride:j*stride+HH, k*stride:k*stride+WW]*dout[n,f,j,k]
dxpad[n, :, j*stride:j*stride+HH, k*stride:k*stride+WW] += w[f]*dout[n,f,j,k]
dx = dxpad[:,:,pad:pad+H, pad:pad+W]
return dx, dw, db
池化层反向传播
假设,∂μ∂L=[1324]。
首先,将其恢复为X的大小⎣⎢⎢⎡1133113322442244⎦⎥⎥⎤。
若为最大池化,则将X中最大的元素保持,其余置0:∂X∂L=⎣⎢⎢⎡1000000300400200⎦⎥⎥⎤
若为平均池化,则求平均值填入对应位置:∂X∂L=⎣⎢⎢⎡0.250.250.750.750.250.250.750.750.50.5110.50.511⎦⎥⎥⎤
def max_pool_forward_naive(x, pool_param):
"""
Inputs:
- x: Input data, of shape (N, C, H, W)
- pool_param: dictionary with the following keys:
- 'pool_height': The height of each pooling region
- 'pool_width': The width of each pooling region
- 'stride': The distance between adjacent pooling regions
Returns a tuple of:
- out: Output data, of shape (N, C, H', W') where H' and W' are given by
H' = 1 + (H - pool_height) / stride
W' = 1 + (W - pool_width) / stride
- cache: (x, pool_param)
"""
out = None
N, C, H, W = x.shape
HH,WW,stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
HHH = 1 + (H - HH) // stride
WWW = 1 + (W - WW) // stride
out = np.zeros((N,C,HHH,WWW))
for j in range(HHH):
for k in range(WWW):
out[:,:,j,k] = np.max(x[:,:,j*stride:j*stride+HH,k*stride:k*stride+WW],axis=(2,3))
cache = (x, pool_param)
return out, cache
def max_pool_backward_naive(dout, cache):
"""
Inputs:
- dout: Upstream derivatives
- cache: A tuple of (x, pool_param) as in the forward pass.
Returns:
- dx: Gradient with respect to x
"""
dx = None
x, pool_param = cache
N, C, H, W = x.shape
HH,WW,stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
HHH = 1 + (H - HH) // stride
WWW = 1 + (W - WW) // stride
dx = np.zeros_like(x)
for j in range(HHH):
for k in range(WWW):
square = x[:,:,j*stride:j*stride+HH,k*stride:k*stride+WW]
dx[:,:,j*stride:j*stride+HH,k*stride:k*stride+WW] = (square == np.max(square,axis=(2, 3), keepdims=True)) * dout[:,:,j,k][:, :, None, None]
return dx
参考
https://www.jianshu.com/p/8ad58a170fd9
https://www.cnblogs.com/pinard/p/6494810.html
https://zhuanlan.zhihu.com/p/22038289?refer=intelligentunit