如何使用截断的 SVD 减少全连接（`“InnerProduct”`）层答案

【问题标题】：How to reduce a fully-connected (`"InnerProduct"`) layer using truncated SVD如何使用截断的 SVD 减少全连接（`“InnerProduct”`）层
【发布时间】：2017-03-21 17:17:06
【问题描述】：

在论文Girshick, R Fast-RCNN (ICCV 2015)，“3.1 Truncated SVD for faster detection”一节中，作者提出使用SVD技巧来减少全连接层的大小和计算时间。

给定一个训练过的模型（deploy.prototxt 和weights.caffemodel），我如何使用这个技巧将全连接层替换为截断层？

【问题讨论】：

标签： machine-learning neural-network linear-algebra deep-learning caffe

【解决方案1】：

一些线性代数背景
奇异值分解（SVD）是将任意矩阵W分解为三个矩阵：

W = U S V*

其中U 和V 是正交矩阵，S 是对角线，对角线上的元素数量级递减。 SVD 的一个有趣特性是它允许使用较低秩矩阵轻松逼近 W：假设您截断 S 以仅具有其 k 前导元素（而不是对角线上的所有元素）然后

W_app = U S_trunc V*

是k 近似于W 的等级。

使用 SVD 逼近全连接层
假设我们有一个模型deploy_full.prototxt 有一个全连接层

# ... some layers here
layer {
  name: "fc_orig"
  type: "InnerProduct"
  bottom: "in"
  top: "out"
  inner_product_param {
    num_output: 1000
    # more params...
  }
  # some more...
}
# more layers...

此外，我们有 trained_weights_full.caffemodel - 为 deploy_full.prototxt 模型训练的参数。

将deploy_full.protoxt 复制到deploy_svd.protoxt 并在您选择的编辑器中打开它。 用这两层替换全连接层：

layer {
  name: "fc_svd_U"
  type: "InnerProduct"
  bottom: "in" # same input
  top: "svd_interim"
  inner_product_param {
    num_output: 20  # approximate with k = 20 rank matrix
    bias_term: false
    # more params...
  }
  # some more...
}
# NO activation layer here!
layer {
  name: "fc_svd_V"
  type: "InnerProduct"
  bottom: "svd_interim"
  top: "out"   # same output
  inner_product_param {
    num_output: 1000  # original number of outputs
    # more params...
  }
  # some more...
}

在python中，一点点net surgery：

import caffe
import numpy as np

orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
# get the original weight matrix
W = np.array( orig_net.params['fc_orig'][0].data )
# SVD decomposition
k = 20 # same as num_ouput of fc_svd_U
U, s, V = np.linalg.svd(W)
S = np.zeros((U.shape[0], k), dtype='f4')
S[:k,:k] = s[:k]  # taking only leading k singular values
# assign weight to svd net
svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
# save the new weights
svd_net.save('trained_weights_svd.caffemodel')

现在我们有了deploy_svd.prototxt 和trained_weights_svd.caffemodel，它们以更少的乘法和权重接近原始网络。

【讨论】：

惊人的解决方案:)
@Dale 不是我的解决方案 - 这是 Ross Girshick 的解决方案。
我想你的意思是写W_app = U S_trunc V*。

【解决方案2】：

实际上，Ross Girshick 的 py-faster-rcnn 存储库包含 SVD 步骤的实现：compress_net.py。

顺便说一句，您通常需要微调压缩模型以恢复准确性（或者以更复杂的方式进行压缩，例如“Accelerating Very Deep Convolutional Networks for Classification and Detection”，Zhang 等人）。

另外，对我来说 scipy.linalg.svd 比 numpy 的 svd 工作得更快。

【讨论】：