【发布时间】:2016-05-13 10:57:34
【问题描述】:
我一直在观看一些关于深度学习/卷积神经网络的视频,例如 here 和 here,我尝试用 C++ 实现我自己的。我尝试在第一次尝试时保持输入数据相当简单,所以我的想法是区分十字和圆圈,我有一个小数据集,每个数据集大约 25 个(64*64 图像),它们看起来像这样:
网络本身是五层:
Convolution (5 filters, size 3, stride 1, with a ReLU)
MaxPool (size 2)
Convolution (1 filter, size 3, stride 1, with a ReLU)
MaxPool (size 2)
Linear Regression classifier
我的问题是我的网络在任何事情上都没有融合。权重似乎都没有变化。如果我运行它,除了在下一次迭代返回之前偶尔跳升的异常值之外,预测大多保持不变。
卷积层训练看起来是这样的,去掉了一些循环让它更干净
// Yeah, I know I should change the shared_ptr<float>
void ConvolutionalNetwork::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label)
{
float biasGradient = 0.0f;
// Calculate the deltas with respect to the input.
for (int layer = 0; layer < m_Filters.size(); ++layer)
{
// Pseudo-code, each loop on it's own line in actual code
For z < depth, x <width - filterSize, y < height -filterSize
{
int newImageIndex = layer*m_OutputWidth*m_OutputHeight+y*m_OutputWidth + x;
For the bounds of the filter (U,V)
{
// Find the index in the input image
int imageIndex = x + (y+v)*m_OutputWidth + z*m_OutputHeight*m_OutputWidth;
int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize;
m_pGradients.get()[imageIndex] += outputGradients.get()[newImageIndex]*input.get()[imageIndex];
m_GradientSum[layer].get()[kernelIndex] += m_pGradients.get()[imageIndex] * m_Filters[layer].get()[kernelIndex];
biasGradient += m_GradientSum[layer].get()[kernelIndex];
}
}
}
// Update the weights
for (int layer = 0; layer < m_Filters.size(); ++layer)
{
For z < depth, U & V < filtersize
{
// Find the index in the input image
int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize;
m_Filters[layer].get()[kernelIndex] -= learningRate*m_GradientSum[layer].get()[kernelIndex];
}
m_pBiases.get()[layer] -= learningRate*biasGradient;
}
}
所以,我创建了一个缓冲区 (m_pGradients),它是输入缓冲区的维度,用于将梯度反馈到前一层,但使用梯度和来调整权重。
最大池化这样计算梯度(它保存最大索引并将所有其他梯度归零)
void MaxPooling::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label)
{
for (int outputVolumeIndex = 0; outputVolumeIndex <m_OutputVolumeSize; ++outputVolumeIndex)
{
int inputIndex = m_Indices.get()[outputVolumeIndex];
m_pGradients.get()[inputIndex] = outputGradients.get()[outputVolumeIndex];
}
}
最后的回归层计算它的梯度是这样的:
void LinearClassifier::Train(std::shared_ptr<float> data,std::shared_ptr<float> output, float y)
{
float * x = data.get();
float biasError = 0.0f;
float h = Hypothesis(output) - y;
for (int i =1; i < m_NumberOfWeights; ++i)
{
float error = h*x[i];
m_pGradients.get()[i] = error;
biasError += error;
}
float cost = h;
m_Error = cost*cost;
for (int theta = 1; theta < m_NumberOfWeights; ++theta)
{
m_pWeights.get()[theta] = m_pWeights.get()[theta] - learningRate*m_pGradients.get()[theta];
}
m_pWeights.get()[0] -= learningRate*biasError;
}
在对两个示例进行 100 次迭代训练后,每个示例的预测都与另一个相同,并且从一开始就没有变化。
- 这样的卷积网络是否应该能够区分这两个类别?
- 这是正确的方法吗?
- 我应该考虑卷积层反向传播中的 ReLU(最大值)吗?
【问题讨论】:
标签: c++ machine-learning neural-network deep-learning backpropagation