【问题标题】:How to atualize and calculate the derivative of the weights and bias of a 3 layer neural network (with only numpy)?如何计算和计算 3 层神经网络(只有 numpy)的权重和偏差的导数?
【发布时间】:2022-08-19 22:35:39
【问题描述】:

我正在尝试创建一个 3 层神经网络,具有一个输入层、一个隐藏层和一个输出层。输入层由(1, 785) Numpy 数组表示,认为我正在使用 MNIST 数据集对 0 到 9 的数字进行分类。我的前向传播算法具有正确的数组的所有维度,但是,当我计算网络权重和偏差的导数时,数组的形状变得与原始形状不同,并且当我进行梯度下降以更新权重和偏差,操作是不可能的,因为根据Numpy documentation,当形状不相等或其中之一等于1时,广播是不可能的

这是反向传播的权重和偏差的导数的计算:

    def backpropagation(self, x, y):
        predicted_value = self.forward_propagation(x)
        cost_value_derivative = self.loss_function(
                predicted_value.T, self.expected_value(y), derivative=True
            )
        print(f\"{\'-*-\'*15} PREDICTION {\'-*-\'*15}\")
        print(f\"Predicted Value: {np.argmax(predicted_value)}\")
        print(f\"Actual Value: {y}\")
        print(f\"{\'-*-\'*15}{\'-*-\'*19}\")

        derivative_W2 = (cost_value_derivative*self.sigmoid(
            self.output_layer_without_activity, derivative=True)
        ).dot(self.hidden_layer.T).T

        print(f\"Derivative_W2: {derivative_W2.shape}, weights_hidden_layer_to_output_layer: {self.weights_hidden_layer_to_output_layer.shape}\")
        assert derivative_W2.shape == self.weights_hidden_layer_to_output_layer.shape

        derivative_b2 = (cost_value_derivative*(self.sigmoid(
                self.output_layer_without_activity, derivative=True).T
        )).T

        print(f\"Derivative_b2: {derivative_b2.shape}, bias_on_output_layer: {self.bias_on_output_layer.shape}\")
        assert derivative_b2.shape == self.bias_on_output_layer.shape

        derivative_b1 = cost_value_derivative*self.sigmoid(
            self.output_layer_without_activity.T, derivative=True
        ).dot(self.weights_hidden_layer_to_output_layer.T).dot(
            self.sigmoid(self.hidden_layer_without_activity, derivative=True)
        )
        print(f\"Derivative_b1: {derivative_b1.shape}, bias_on_hidden_layer: {self.bias_on_hidden_layer.shape}\")

        assert derivative_b1.shape == self.bias_on_hidden_layer.shape

        derivative_W1 = cost_value_derivative*self.sigmoid(
            self.output_layer_without_activity.T, derivative=True
        ).dot(self.weights_hidden_layer_to_output_layer.T).dot(self.sigmoid(
                self.hidden_layer_without_activity, derivative=True)
        ).dot(x)

        print(f\"Derivative_W1: {derivative_W1.shape}, weights_input_layer_to_hidden_layer: {self.weights_input_layer_to_hidden_layer.shape}\")
        assert derivative_W1.shape == self.weights_input_layer_to_hidden_layer.shape

        return derivative_W2, derivative_b2, derivative_W1, derivative_b1

这是我实现的前向传播:

    def forward_propagation(self, x):

        self.hidden_layer_without_activity = self.weights_input_layer_to_hidden_layer.T.dot(x.T) + self.bias_on_hidden_layer

        self.hidden_layer = self.sigmoid(
            self.hidden_layer_without_activity
        )

        self.output_layer_without_activity = self.weights_hidden_layer_to_output_layer.T.dot(
            self.hidden_layer
        ) + self.bias_on_output_layer

        self.output_layer = self.sigmoid(
            self.output_layer_without_activity
        )

        return self.output_layer

weights_hidden_layer_to_output_layer 变量为例,权重和偏差的梯度下降更新为weights_on_hidden_layer_to_output_layer -= learning_rate*derivative_W2,其中derivative_W2 是损失函数相对于weights_hidden_layer_to_output_layer 的导数。

    标签: python machine-learning math deep-learning neural-network


    【解决方案1】:

    由于您没有提供函数的定义,因此很难知道哪里出错了。但是,我通常使用此代码 sn-p 来计算具有 1 个隐藏层和所有 sigmoid 激活的 NN。我希望它可以帮助您调试代码。

    for epoch in range(epochs):
        # forward propagation
        Z1 = np.dot(W1, X) + b1
        A1 = sigmoid(Z1)
        Z2 = np.dot(W2, A1) + b2
        A2 = Sigmoid(Z2)
    
        # backward propagation
        dZ2 = A2 - Y
        dW2 = 1/m * np.dot(dZ2, A1.T)
        db2 = 1/m * np.sum(dZ2, axis=1, keepdims=True)
        dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
        dW1 = 1/m * np.dot(dZ1, X.T)
        db1 = 1/m * np.sum(dZ1, axis=1, keepdims=True)
    
        # update parameters
        W1 = W1 - alpha * dW1
        b1 = b1 - alpha * db1
        W2 = W2 - alpha * dW2
        b2 = b2 - alpha * db2
    
    print(f'W1:{W1} b1:{b1} W2:{W2} b2:{b2}')
    

    【讨论】:

      猜你喜欢
      • 2018-09-28
      • 2018-06-10
      • 2019-01-23
      • 2015-05-01
      • 1970-01-01
      • 2017-09-19
      • 2020-01-21
      • 2018-03-01
      • 1970-01-01
      相关资源
      最近更新 更多