反向传播在 C++ 神经网络中给出奇怪的值答案

【问题标题】：Backpropagation giving strange values in C++ neural network反向传播在 C++ 神经网络中给出奇怪的值
【发布时间】：2020-06-20 04:03:24
【问题描述】：

我正在尝试使用我从头开始用 C++ 编写的神经网络解决 iris 数据集，它有 150 行分为 3 朵不同的花，有 4 列，然后是我转换为 0 的花类型的五分之一, 1 或 2。

问题：每当我运行网络时，它都会经过一个 90 行的测试集，分成 3 朵不同的花（30、30、30）。每次我运行一个 epoch 时，它都会显示输出值都非常高，例如 (0.99, 0.99, 0.98)。它将在几个时期内这样做，然后最终降低到更合理的值。但是当它到达后面的时期时，当我说 50 个时期时，正确花的值将越来越接近 1.00，对于每朵花，然后对下一朵花和之后的花做同样的事情，然后它将重新开始该过程。而不是从接近 1.0 开始，表明它已经学习并且权重已正确调整。

运行 epoch 的控制台输出（运行 forward_prop()、back_prop() 和 update_weights()），在每个 epoch 之后它打印出网络的输出值。在纪元结束时打印意味着实际值始终为{0, 0, 1}。当我运行网络时，我运行了 1000 次，15 之后每个 epoch 的输出值都不会改变。为什么要这样做？

File parsed, weights and bias randomized

Epoch 1

0.97 0.97 0.99 Epoch 2

0.93 0.94 0.99 Epoch 3

0.64 0.70 0.99 Epoch 4

0.27 0.36 0.99 Epoch 5

0.22 0.31 0.99 Epoch 6

0.21 0.30 0.99 Epoch 7

0.21 0.30 0.98 Epoch 8

0.21 0.30 0.98 Epoch 9

0.21 0.30 0.96 Epoch 10

0.21 0.30 0.88 Epoch 11

0.21 0.30 0.66 Epoch 12

0.21 0.30 0.56 Epoch 13

0.21 0.30 0.54 Epoch 14

0.21 0.30 0.53 Epoch 15

0.21 0.30 0.53 completed successfully

结束控制台输出。

9 纪元示例

0.21 0.30 0.98
0.21 0.30 0.98
0.22 0.29 0.98
0.23 0.29 0.98
0.24 0.28 0.98
0.25 0.28 0.98
0.25 0.27 0.98
0.26 0.27 0.98 
0.27 0.27 0.98
0.28 0.26 0.98
0.29 0.26 0.98
0.30 0.26 0.98
0.31 0.26 0.98
0.32 0.25 0.98
0.34 0.25 0.98
0.35 0.24 0.98
0.36 0.24 0.98
0.37 0.24 0.98 
0.38 0.24 0.98
0.40 0.23 0.98
0.41 0.23 0.98
0.42 0.23 0.98
0.43 0.23 0.98
0.44 0.22 0.98
0.45 0.22 0.98
0.46 0.22 0.98 
0.48 0.22 0.98
0.49 0.22 0.98
0.50 0.21 0.98 
0.51 0.21 0.98
0.53 0.20 0.98
0.52 0.21 0.98
0.50 0.22 0.98
0.49 0.23 0.98
0.48 0.24 0.98
0.47 0.24 0.98
0.46 0.25 0.98
0.45 0.26 0.98
0.44 0.27 0.98 
0.43 0.28 0.98
0.42 0.29 0.98
0.42 0.30 0.98
0.41 0.32 0.98 
0.40 0.33 0.98
0.39 0.34 0.98
0.38 0.35 0.98
0.38 0.36 0.98
0.37 0.37 0.98
0.36 0.38 0.98
0.35 0.40 0.98
0.35 0.41 0.98
0.34 0.42 0.98
0.34 0.43 0.98
0.33 0.44 0.98
0.32 0.46 0.98 
0.32 0.47 0.98
0.31 0.48 0.98
0.31 0.49 0.98 
0.30 0.50 0.98
0.30 0.51 0.97
0.30 0.52 0.98
0.29 0.51 0.98
0.29 0.50 0.98
0.28 0.49 0.98
0.28 0.48 0.98
0.27 0.47 0.98
0.27 0.46 0.97 
0.27 0.45 0.98
0.26 0.44 0.98
0.26 0.43 0.98
0.26 0.42 0.98
0.25 0.41 0.98
0.25 0.40 0.98
0.25 0.40 0.98
0.24 0.39 0.98 
0.24 0.38 0.98
0.24 0.37 0.98
0.24 0.37 0.98
0.23 0.36 0.98
0.23 0.35 0.98 
0.23 0.35 0.98
0.23 0.34 0.98
0.22 0.33 0.98
0.22 0.33 0.98
0.22 0.32 0.98
0.22 0.32 0.98
0.21 0.31 0.98
0.21 0.31 0.98
0.21 0.30 0.98 
0.21 0.30 0.98 Epoch 9

所以在 epoch 9 中，前 30 行的实际值为 {1, 0, 0}，接下来的 30 行的实际值为 {0, 1, 0}，最后 30 行的实际值为 { 0, 0, 1}。看看每行数据如何越来越近，但最后一行保持不变，而不是在所有时期都保持不变。这很奇怪，我不确定它为什么会这样做。

所以程序的基本结构是：

main() 执行、声明和初始化具有输入、隐藏和输出层的类 Neural_Network。

调用train() 然后执行epoch()，它在循环中运行调用train 时指定的次数。

epoch() 本身运行 forward_prop()、back_prop() 和最后 update_network()，还有一些变量，例如用于输出的预期值和实际值的数组。

向量偏差、值、权重和误差都分别保存了网络的值，我发现这对可读性更好。权重向量的第一层或位置[0]为空，输入值使用隐藏层的权重，隐藏层使用输出层的权重。

每个权重是一个权重向量，等于上一层节点的数量，权重向量的位置[0]用于上一层位置[0]的节点。

#include <iostream>
#include <cstdlib>
#include <iomanip>
#include <cmath>
#include <fstream>
#include <sstream>
#include <vector>
#include <array>
#include <string>
#include <numeric>

class Neural_Network
{
private:
    std::vector<std::array<double, 4>> training_set; // 30 setosa -> 30 versicolor -> 30 virginica
    std::vector<std::vector<double>> values, bias, errors;
    std::vector<std::vector<std::vector<double>>> weights;
    size_t net_size = 0;
    double dot_val(std::vector<double> val, std::vector<double> weights);
    double sigmoid(const double num);
    double random_number();
    double transfer_derivitive(double num);
    void initialize(std::vector<size_t> layers);
    void forward_prop(std::vector<double>& expected);
    void back_prop(std::vector<double> expected);
    void update_network(double l_rate);

public:
    Neural_Network(const std::vector<std::array<double, 4>>& data);
    ~Neural_Network() = default;
    void train(size_t epochs = 1);
    void display();
};

Neural_Network::Neural_Network(const std::vector<std::array<double, 4>>& data) : training_set{ data }
{
    initialize({ 4, 6, 3 });
}

double Neural_Network::dot_val(std::vector<double> val, std::vector<double> weights)
{
    return std::inner_product(val.begin(), val.end(), weights.begin(), 0.0);
}

double Neural_Network::sigmoid(const double num)
{
    return (1 / (1 + exp(-num)));
}

double Neural_Network::random_number()
{
    return (double)rand() / (double)RAND_MAX;
}

double Neural_Network::transfer_derivitive(double num)
{
    return num * (1 - num);
}

void Neural_Network::display()
{
    std::cout << std::fixed << std::setprecision(2) << "values:\n";
    for (size_t i = 0; i < values.size(); ++i)
    {
        std::cout << "layer " << i << "\n[ ";
        for (size_t j = 0; j < values[i].size(); ++j)
            std::cout << values.at(i).at(j) << " ";
        std::cout << " ]\n";
    }
}

void Neural_Network::initialize(std::vector<size_t> layers)
{
    for (size_t i = 0; i < layers.size(); ++i)
    {
        std::vector<double> v{}, b{}, e{};
        std::vector<std::vector<double>> w{};
        //initializing the nodes in the layers
        for (size_t j = 0; j < layers.at(i); ++j)
        {
            v.push_back(0);
            b.push_back(random_number());
            e.push_back(1);
            std::vector<double> inner_w{};
            if (i != 0)                                    // checking if the current layer is the input
                for (size_t k = 0; k < layers.at(i - 1); ++k) // adding weights to the current layer to the amount of nodes in the next layer
                    inner_w.push_back(random_number());    // adding a weight to the current layer for a node in the next layer
            w.push_back(inner_w);
        }
        values.push_back(v);
        bias.push_back(b);
        errors.push_back(e);
        weights.push_back(w);
        ++net_size;
    }
    std::cout << "initialize network success" << std::endl;
}

void Neural_Network::train(size_t epoch_count)
{
    const size_t count = epoch_count;
    while (epoch_count > 0)
    {
        std::cout << "\nEpoch " << 1 + (count - epoch_count) << std::endl;
        for (size_t i = 0; i < 90; ++i)
        {
            std::vector<double> expected{ 0, 0, 0 };
            if (i < 30)
                expected[0] = 1;
            else if (i < 60)
                expected[1] = 1;
            else if (i < 90)
                expected[2] = 1;
            for (size_t j = 0; j < values[0].size(); ++j) // Initialize input layer values
                values.at(0).at(j) = training_set.at(i).at(j);        // value[0] is the input layer, j is the node
            forward_prop(expected);
            back_prop(expected);
            update_network(0.05);
        }
        display();
        --epoch_count;
    }
}

void Neural_Network::forward_prop(std::vector<double>& expected)
{
    for (size_t i = 1; i < net_size - 1; ++i)                                           // looping through every layer except the first and last
        for (size_t j = 0; j < values.at(i).size(); ++j)                                   // looping through every node in the current non input/output layer
            values.at(i).at(j) = sigmoid(dot_val(values.at(i - 1), weights.at(i).at(j)) + bias.at(i).at(j)); // assigning node j of layer i a sigmoided val that is the dotval + the associated bias
    for (size_t i = 0; i < values.at(net_size - 1).size(); ++i)                            // looping through the ouptut layer
        values.at(net_size - 1).at(i) = sigmoid(dot_val(values.at(net_size - 2), weights.at(net_size - 1).at(i)) + bias.at(net_size - 1).at(i));
}

void Neural_Network::back_prop(std::vector<double> expected) // work backwards from the output layer
{
    std::vector<double> output_errors{};
    for (size_t i = 0; i < errors.at(net_size - 1).size(); ++i) // looping through the output layer
    {
        output_errors.push_back(expected.at(i) - values.at(net_size - 1).at(i));
        errors.at(net_size - 1).at(i) = output_errors.at(i) * transfer_derivitive(values.at(net_size - 1).at(i));
    }                                         // output layer finished
    for (size_t i = net_size - 2; i > 0; i--) // looping through the non output layers backwards
    {
        std::vector<double> layer_errors{};
        for (size_t j = 0; j < errors.at(i).size(); ++j) // looping through the current layer's nodes
        {
            double error = 0;
            for (size_t k = 0; k < weights.at(i + 1).size(); ++k) // looping through the current set of weights
                error += errors.at(i).at(j) * weights.at(i + 1).at(k).at(j);
            layer_errors.push_back(error);
        }
        for (size_t j = 0; j < layer_errors.size(); ++j)
            errors.at(i).at(j) = layer_errors.at(j) * transfer_derivitive(values.at(i).at(j));
    }
}

void Neural_Network::update_network(double l_rate)
{
    for (size_t i = 1; i < net_size; ++i)
    {
        for (size_t j = 0; j < weights.at(i).size(); ++j)
        {
            for (size_t k = 0; k < weights.at(i).at(j).size(); ++k)
                weights.at(i).at(j).at(k) += l_rate * errors.at(i).at(j) * values.at(i - 1).at(j);
            bias.at(i).at(j) += l_rate * errors.at(i).at(j);
        }
    }
}

int main()
{
    std::vector<std::array<double, 4>> data = {
        {5.1, 3.5, 1.4, 0.2},
        {4.9, 3, 1.4, 0.2},
        {4.7, 3.2, 1.3, 0.2},
        {4.6, 3.1, 1.5, 0.2},
        {5, 3.6, 1.4, 0.2},
        {5.4, 3.9, 1.7, 0.4},
        {4.6, 3.4, 1.4, 0.3},
        {5, 3.4, 1.5, 0.2},
        {4.4, 2.9, 1.4, 0.2},
        {4.9, 3.1, 1.5, 0.1},
        {5.4, 3.7, 1.5, 0.2},
        {4.8, 3.4, 1.6, 0.2},
        {4.8, 3, 1.4, 0.1},
        {4.3, 3, 1.1, 0.1},
        {5.8, 4, 1.2, 0.2},
        {5.7, 4.4, 1.5, 0.4},
        {5.4, 3.9, 1.3, 0.4},
        {5.1, 3.5, 1.4, 0.3},
        {5.7, 3.8, 1.7, 0.3},
        {5.1, 3.8, 1.5, 0.3},
        {5.4, 3.4, 1.7, 0.2},
        {5.1, 3.7, 1.5, 0.4},
        {4.6, 3.6, 1, 0.2},
        {5.1, 3.3, 1.7, 0.5},
        {4.8, 3.4, 1.9, 0.2},
        {5, 3, 1.6, 0.2},
        {5, 3.4, 1.6, 0.4},
        {5.2, 3.5, 1.5, 0.2},
        {5.2, 3.4, 1.4, 0.2},
        {4.7, 3.2, 1.6, 0.2},
        {7, 3.2, 4.7, 1.4},
        {6.4, 3.2, 4.5, 1.5},
        {6.9, 3.1, 4.9, 1.5},
        {5.5, 2.3, 4, 1.3},
        {6.5, 2.8, 4.6, 1.5},
        {5.7, 2.8, 4.5, 1.3},
        {6.3, 3.3, 4.7, 1.6},
        {4.9, 2.4, 3.3, 1},
        {6.6, 2.9, 4.6, 1.3},
        {5.2, 2.7, 3.9, 1.4},
        {5, 2, 3.5, 1},
        {5.9, 3, 4.2, 1.5},
        {6, 2.2, 4, 1},
        {6.1, 2.9, 4.7, 1.4},
        {5.6, 2.9, 3.6, 1.3},
        {6.7, 3.1, 4.4, 1.4},
        {5.6, 3, 4.5, 1.5},
        {5.8, 2.7, 4.1, 1},
        {6.2, 2.2, 4.5, 1.5},
        {5.6, 2.5, 3.9, 1.1},
        {5.9, 3.2, 4.8, 1.8},
        {6.1, 2.8, 4, 1.3},
        {6.3, 2.5, 4.9, 1.5},
        {6.1, 2.8, 4.7, 1.2},
        {6.4, 2.9, 4.3, 1.3},
        {6.6, 3, 4.4, 1.4},
        {6.8, 2.8, 4.8, 1.4},
        {6.7, 3, 5, 1.7},
        {6, 2.9, 4.5, 1.5},
        {5.7, 2.6, 3.5, 1},
        {6.3, 3.3, 6, 2.5},
        {5.8, 2.7, 5.1, 1.9},
        {7.1, 3, 5.9, 2.1},
        {6.3, 2.9, 5.6, 1.8},
        {6.5, 3, 5.8, 2.2},
        {7.6, 3, 6.6, 2.1},
        {4.9, 2.5, 4.5, 1.7},
        {7.3, 2.9, 6.3, 1.8},
        {6.7, 2.5, 5.8, 1.8},
        {7.2, 3.6, 6.1, 2.5},
        {6.5, 3.2, 5.1, 2},
        {6.4, 2.7, 5.3, 1.9},
        {6.8, 3, 5.5, 2.1},
        {5.7, 2.5, 5, 2},
        {5.8, 2.8, 5.1, 2.4},
        {6.4, 3.2, 5.3, 2.3},
        {6.5, 3, 5.5, 1.8},
        {7.7, 3.8, 6.7, 2.2},
        {7.7, 2.6, 6.9, 2.3},
        {6, 2.2, 5, 1.5},
        {6.9, 3.2, 5.7, 2.3},
        {5.6, 2.8, 4.9, 2},
        {7.7, 2.8, 6.7, 2},
        {6.3, 2.7, 4.9, 1.8},
        {6.7, 3.3, 5.7, 2.1},
        {7.2, 3.2, 6, 1.8},
        {6.2, 2.8, 4.8, 1.8},
        {6.1, 3, 4.9, 1.8},
        {6.4, 2.8, 5.6, 2.1},
        {7.2, 3, 5.8, 1.6}
    };

    Neural_Network network{ data };
    network.train(1);
    return 0;
}

编辑以使用 .at() 而不是 [] 来访问程序中的 std::vector

我希望我把一切都说清楚了，如果不让我知道的话。

注意：我有这个 stackoverflow 的问题，有人告诉我应该把它移到 codereview.stackexchange，然后他们告诉我我应该搬家它再次回到stackoverflow，同时用更多来重新定义我的问题细节。请不要告诉我第三次移动这个问题。如果我的提问方式有问题，请给我机会更改或添加信息，以便我能得到一些帮助，谢谢

【问题讨论】：

double output; -- 您的编译器是否警告您该变量未初始化？然后在dot_val 中使用这个变量，因此结果可以是任何值。其次，有一个std::inner_product 函数可以防止这个错误
嘿@DMS，很高兴看到您从 Code Review 中移除此内容。我认为这对Politics 来说更好。干杯。（如果不是很明显就开玩笑）
我不知道 std::inner_product，我会在今天晚些时候或明天有空的时候测试一下。我还将double output 初始化为0。没有任何变化，也没有我的编译器根本没有警告我，一切都在毫无警告地运行。
while(!in_file.eof()) -- Please read this as to why this is not correct

标签： c++ machine-learning neural-network backpropagation

【解决方案1】：

一个明显的错误是dot_val：

double Neural_Network::dot_val(std::vector<double> val,std::vector<double> weights)
{
    double output;  // <-- This is uninitialized
    for (size_t i = 0; i < weights.size(); ++i)
        output += val[i] * weights[i];
    return output;  // <-- Who knows what this will be
}

您正在使用未初始化的变量。将 output 初始化为 0，或者您可以使用 std::inner_product：

#include <numeric>
//...
double Neural_Network::dot_val(std::vector<double> val,std::vector<double> weights)
{
    return std::inner_product(val.begin(), val.end(), weights.begin(), 0.0);
}

【讨论】：

我使用std::inner_product 进行了更改，但输出仍然相同。
我的回答指出了将未确定的初始值相加的明显错误。您可能有一个或多个错误，但没有minimal reproducible example，您将独自一人。
好吧，我编辑了帖子并删除了所有多余的不必要的代码，并将所有内容合并为一个部分。您可以在 IDE 中运行代码 sn-p，它会重现该问题。我还将训练集硬编码为数组向量。
在访问向量时，您应该开始使用at() 而不是[ ]。如果你这样做了，你会看到这一行：weights[i][j][k] += l_rate * errors[i][j] * values[i-1][j]; 有values 访问j 的无效索引。例如：weights[i][j][k] += l_rate * errors.at(i).at(j) * values.at(i - 1).at(j); 将抛出 std::out_of_range 异常。此外，您应该尝试让您的代码在 Visual Studio 社区中运行——它会在调试运行时检测到此错误。现在，您的代码正在调用未定义的行为。