【发布时间】:2022-08-19 22:35:39
【问题描述】:
我正在尝试创建一个 3 层神经网络,具有一个输入层、一个隐藏层和一个输出层。输入层由(1, 785) Numpy 数组表示,认为我正在使用 MNIST 数据集对 0 到 9 的数字进行分类。我的前向传播算法具有正确的数组的所有维度,但是,当我计算网络权重和偏差的导数时,数组的形状变得与原始形状不同,并且当我进行梯度下降以更新权重和偏差,操作是不可能的,因为根据Numpy documentation,当形状不相等或其中之一等于1时,广播是不可能的
这是反向传播的权重和偏差的导数的计算:
def backpropagation(self, x, y):
predicted_value = self.forward_propagation(x)
cost_value_derivative = self.loss_function(
predicted_value.T, self.expected_value(y), derivative=True
)
print(f\"{\'-*-\'*15} PREDICTION {\'-*-\'*15}\")
print(f\"Predicted Value: {np.argmax(predicted_value)}\")
print(f\"Actual Value: {y}\")
print(f\"{\'-*-\'*15}{\'-*-\'*19}\")
derivative_W2 = (cost_value_derivative*self.sigmoid(
self.output_layer_without_activity, derivative=True)
).dot(self.hidden_layer.T).T
print(f\"Derivative_W2: {derivative_W2.shape}, weights_hidden_layer_to_output_layer: {self.weights_hidden_layer_to_output_layer.shape}\")
assert derivative_W2.shape == self.weights_hidden_layer_to_output_layer.shape
derivative_b2 = (cost_value_derivative*(self.sigmoid(
self.output_layer_without_activity, derivative=True).T
)).T
print(f\"Derivative_b2: {derivative_b2.shape}, bias_on_output_layer: {self.bias_on_output_layer.shape}\")
assert derivative_b2.shape == self.bias_on_output_layer.shape
derivative_b1 = cost_value_derivative*self.sigmoid(
self.output_layer_without_activity.T, derivative=True
).dot(self.weights_hidden_layer_to_output_layer.T).dot(
self.sigmoid(self.hidden_layer_without_activity, derivative=True)
)
print(f\"Derivative_b1: {derivative_b1.shape}, bias_on_hidden_layer: {self.bias_on_hidden_layer.shape}\")
assert derivative_b1.shape == self.bias_on_hidden_layer.shape
derivative_W1 = cost_value_derivative*self.sigmoid(
self.output_layer_without_activity.T, derivative=True
).dot(self.weights_hidden_layer_to_output_layer.T).dot(self.sigmoid(
self.hidden_layer_without_activity, derivative=True)
).dot(x)
print(f\"Derivative_W1: {derivative_W1.shape}, weights_input_layer_to_hidden_layer: {self.weights_input_layer_to_hidden_layer.shape}\")
assert derivative_W1.shape == self.weights_input_layer_to_hidden_layer.shape
return derivative_W2, derivative_b2, derivative_W1, derivative_b1
这是我实现的前向传播:
def forward_propagation(self, x):
self.hidden_layer_without_activity = self.weights_input_layer_to_hidden_layer.T.dot(x.T) + self.bias_on_hidden_layer
self.hidden_layer = self.sigmoid(
self.hidden_layer_without_activity
)
self.output_layer_without_activity = self.weights_hidden_layer_to_output_layer.T.dot(
self.hidden_layer
) + self.bias_on_output_layer
self.output_layer = self.sigmoid(
self.output_layer_without_activity
)
return self.output_layer
以weights_hidden_layer_to_output_layer 变量为例,权重和偏差的梯度下降更新为weights_on_hidden_layer_to_output_layer -= learning_rate*derivative_W2,其中derivative_W2 是损失函数相对于weights_hidden_layer_to_output_layer 的导数。
标签: python machine-learning math deep-learning neural-network