Capsule Networks胶囊网络

author: Sargur Srihari [email protected]

This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/CSE676

文章目录

Limitations of Convolutional Networks

ConvolutionalNeuralNetworks
Processing Steps and Training for ConvNets
Pooling and Invariance
Example of CNN Limitation

CNN to recognize faces extracts features from image

Motivation for CapsNets
Solution offered by CapsNets
Visual Fixation

Human vision uses saccades
Parse Tree of a Fixation
Activation is a likelihood

CNN versus CapsNets

Limitations of Convolutional Networks

ConvolutionalNeuralNetworks

Capsule Networks胶囊网络

Source: https://hackernoon.com/ what-is-a-capsnet-or-capsule- network-2bfbe48769cc

与常规神经网络相比，将计算量最小化
卷积极大地简化了计算，而不会丢失数据的本质
擅长处理图像分类
在所有图像位置使用相同的知识

Processing Steps and Training for ConvNets

Givenaninputimage,asetofkernelsorfiltersscan it and perform the convolution operation.
This creates a feature map inside the network.
These features next pass via activation and pooling layers
• Activation layers, e.g., ReLU, induce nonlinearity
• Pooling (eg: max pooling) helps in reducing the training time.
（pooling实现子区域的摘要，实现不变性）
At the end, it will pass via a classifier sigmoid/softmax
Training is based on back propagation（反向传播） of error matched against labeled data.
（非线性也有助于解决消失的梯度问题）

Pooling and Invariance

（池化和不变性）
Pooling应该获得位置，方向，比例或旋转不变性。
Capsule Networks胶囊网络
Every input value changed, but only half the output values have changed because maxpool is only sensitive to max value in neighborhood not exact value.

Example of CNN Limitation

CNN to recognize faces extracts features from image

Capsule Networks胶囊网络
与顺序无关，位置不对CNN也能进行识别

Motivation for CapsNets

Caps nets are an improvement on CNNs

They are the next version of CNNs
Solve problems due to max pooling and deep nets
Loss of information regarding order and feature orientation
Hinton: “The pooling operation used in CNNs is a big mistake and the fact that it works so well is a disaster”

Solution offered by CapsNets

Low level features should also be arranged in a certain order for the object to be classified as a face
（排序低级特征）
Order is determined during training when the network learns not only what features to look for but also what their relationships to one another should be （顺利由训练决定，不仅学习特征，还要学习特征之间的关系）
具有特征顺序特征的图像才会被识别为人脸。

Visual Fixation

（视觉固定）

Human vision uses saccades

（人类视觉使用扫视）

通过仔细的固定顺序忽略无关的细节
确保仅以最高的分辨率处理光学阵列的一小部分

We assume a single fixation will give us
• Much more than a single identified object and its properties
• Assume our multi layer visual system creates a parse tree on each fixation
• We ignore coordination of parse trees（解析树） over multiple fixations
Capsule Networks胶囊网络

Parse Tree of a Fixation

对于单个注视，
从固定的多层神经网络中刻出一个分析树
像岩石上的雕塑
每层将被分成许多小的神经元组，称为“胶囊”
解析树中的每个节点将对应一个活动胶囊

Activation is a likelihood

神经元的**水平可以解释为检测到特定特征的可能性
Capsule Networks胶囊网络
胶囊是一组神经元，不仅捕获可能性，而且捕获特定特征的参数。

CNN versus CapsNets

未完待续