感知器神经网络不会学习特定范围内的值答案

【问题标题】：Perceptron Neural Network will not learn values in a specific range感知器神经网络不会学习特定范围内的值
【发布时间】：2021-06-23 11:28:13
【问题描述】：

我正在使用神经网络，并希望创建一个干净的类实现来处理任何大小的网络。目前，我正在调试我的学习函数来处理 2-Layer 网络。

在当前状态下使用逻辑激活：

它无法学习低于 0.5 的值
它不能处理输入向量的矩阵（只有单个输入向量），这可以稍后实现
如果初始权重和偏差导致输出小于 0.5，它可能会向 0 学习
假设：在“理想”条件下，它将使用二进制输入的任意组合学习 0.5 到 1 之间的任意值
- 已使用 2 个和 3 个网络输入进行了测试
无论层数如何，都能进行正确的前向传播

以下是相关代码：

import numpy as np

def logistic(x, deriv = False):
  '''
  If using the derivative, input must be result of logistic
  '''
  if deriv:
    return x*(1-x)
    
  return 1/(1+np.exp(-x))

def feed_forw(input, weights):
  '''
  ***Wrapper for input.dot(weights)
  Input should be a np.array the same length as number of input nodes
    - A row of input represents the vector of input nodes
    - Different Rows are different input cases
  Weights is a 2D np.array of weights for each input node to each output node
    - dimensions of weights will determine length of output vector
    - top row is weights going from first input to node to all output nodes
    - first col is weights going from all input nodes to first output node
  '''

  return input.dot(weights)

class ANN:
  '''
  Artificial Neural Network of Perceptron Design
  Member Attributes:
    Weights: tuple of np.array
    - # of elements define number of layers
    - shapes of each element define nodes of each connecting pair of connecting layers
    Bias: tuple of np.array
    - added to each node after the first layer on a per layer basis
    - must have same dimensions as output from each corresponding element in Weights
    Target: np.array
    - array representing desired output.
  '''
  
  def __init__(self, weights, bias = 0, target = None):
    self._weights = weights
    self._bias = bias
    self._target = target

  def __str__(self):
    data = ''
    for w,b in zip(self._weights, self._bias):
      data += f'Weight\n{w}\nbias\n{b}\n'

    return f'{data}Seeking\n{self._target}\n'

  def _forwardProp(self, v, activation):
    '''
    Helper function to Learn
    '''
    out = []
    out.append(v.copy())
    for w,b in zip(self._weights, self._bias):
      out.append(feed_forw(out[-1], w) + b)
      out.append(activation(out[-1]))
    return out

  def setTarget(self, target):
    self._target = target

  def learn(self, input, activation, epoch = 10, eta = 1, debug = False):
    '''
    ***Currently only functions with 2-Layer perceptrons***
    input: np.array
    - a matrix representing each of case of input vectors
    - rows are input vectors for a single case
    activation: function object
    - An activation function used to normalize output
    epoch: int
    - test cycles
    eta: int
    - learning parameter
    '''
    for e in range(epoch):
      layers = self._forwardProp(input, activation)
      #layers is a list for keeping track of changes between layers
      #Pattern follows:
      #[input, layer 0 - weighted sum, layer 1 - activation, layer 1(output) - 
      #   weighted sum, layer 2 - activation, layer 2 ... 
      #   weighted sum, output layer - activation, ouput layer]
      
      #Final element is always network output
      error = layers[-1] - self._target

      delta_out = error * activation(layers[-1], deriv = True)
      #derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
      #derivError_out = delta_out * layers[-3].T*self._weights[-1]
      #EDIT
      derivError_out = delta_out * layers[-3].T
      derivError_bias = delta_out * self._bias[-1].T
      self._weights += -eta*derivError_out
      self._bias += -eta*derivError_bias

      if debug:
        print(f'Epoch {e+1}:\nOutput:\n{layers[-1]}\nError is\n{error}\nDelta Out Node:\n{delta_out}')
        print(f'Weight Increment:\n{derivError_out}\nBias Increment:\n{derivError_bias}')
        print(f'State after training rotation:\n{self}')

      #i = 1
      #while i < len(layers) + 1:
        #This loop will count from the last element of layers, will go back by 2
        #...
        #i += 2

用于测试的代码及其输出：

w2 = np.array([[0.03],
               [-0.1]])
b2 = np.array([[0.7]])
nn1 = ANN((w2,), (b2,))
x = np.array([[1,1]])
t = np.array([[0.7]])
nn1.setTarget(t)
nn1.learn(x, logistic, 100, debug = True)
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[-0.04751054]]
Delta Out Node:
[[-0.01077287]]
Weight Increment:
[[-0.00032319]
 [ 0.00107729]]
Bias Increment:
[[-0.00754101]]
State after training rotation:
Weight
[[ 0.03032319]
 [-0.10107729]]
bias
[[0.70754101]]
Seeking
[[0.7]]

Epoch 2:
Output:
[[0.65402678]]
Error is
[[-0.04597322]]
Delta Out Node:
[[-0.01040263]]
Weight Increment:
[[-0.00031544]
 [ 0.00105147]]
Bias Increment:
[[-0.00736028]]
State after training rotation:
Weight
[[ 0.03063863]
 [-0.10212876]]
bias
[[0.71490129]]
Seeking
[[0.7]]
...
Epoch 99:
Output:
[[0.69871509]]
Error is
[[-0.00128491]]
Delta Out Node:
[[-0.00027049]]
Weight Increment:
[[-1.08348447e-05]
 [ 3.61161491e-05]]
Bias Increment:
[[-0.00025281]]
State after training rotation:
Weight
[[ 0.04006734]
 [-0.13355782]]
bias
[[0.93490471]]
Seeking
[[0.7]]

Epoch 100:
Output:
[[0.69876299]]
Error is
[[-0.00123701]]
Delta Out Node:
[[-0.00026038]]
Weight Increment:
[[-1.04328444e-05]
 [ 3.47761479e-05]]
Bias Increment:
[[-0.00024343]]
State after training rotation:
Weight
[[ 0.04007778]
 [-0.13359259]]
bias
[[0.93514815]]
Seeking
[[0.7]]
'''
#This cell is rerun with
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[0.25248946]]
Delta Out Node:
[[0.05725122]]
Weight Increment:
[[ 0.00171754]
 [-0.00572512]]
Bias Increment:
[[0.04007585]]
State after training rotation:
Weight
[[ 0.02828246]
 [-0.09427488]]
bias
[[0.65992415]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.64426676]]
Error is
[[0.24426676]]
Delta Out Node:
[[0.05598279]]
Weight Increment:
[[ 0.00158333]
 [-0.00527777]]
Bias Increment:
[[0.0369444]]
State after training rotation:
Weight
[[ 0.02669913]
 [-0.08899711]]
bias
[[0.62297975]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.50544009]]
Error is
[[0.10544009]]
Delta Out Node:
[[0.0263569]]
Weight Increment:
[[ 2.73123106e-05]
 [-9.10410354e-05]]
Bias Increment:
[[0.00063729]]
State after training rotation:
Weight
[[ 0.00100894]
 [-0.00336312]]
bias
[[0.02354185]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.50529672]]
Error is
[[0.10529672]]
Delta Out Node:
[[0.02632123]]
Weight Increment:
[[ 2.65564469e-05]
 [-8.85214898e-05]]
Bias Increment:
[[0.00061965]]
State after training rotation:
Weight
[[ 0.00098238]
 [-0.0032746 ]]
bias
[[0.0229222]]
Seeking
[[0.4]]
'''
#Cell is rerun again with
b2 = np.array([[-0.7]])
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.31647911]]
Error is
[[-0.08352089]]
Delta Out Node:
[[-0.01806725]]
Weight Increment:
[[-0.00054202]
 [ 0.00180672]]
Bias Increment:
[[0.01264707]]
State after training rotation:
Weight
[[ 0.03054202]
 [-0.10180672]]
bias
[[-0.71264707]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.31347742]]
Error is
[[-0.08652258]]
Delta Out Node:
[[-0.01862047]]
Weight Increment:
[[-0.00056871]
 [ 0.00189569]]
Bias Increment:
[[0.01326982]]
State after training rotation:
Weight
[[ 0.03111072]
 [-0.10370241]]
bias
[[-0.72591689]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.01206264]]
Error is
[[-0.38793736]]
Delta Out Node:
[[-0.0046231]]
Weight Increment:
[[-0.00079352]
 [ 0.00264508]]
Bias Increment:
[[0.01851554]]
State after training rotation:
Weight
[[ 0.17243664]
 [-0.57478879]]
bias
[[-4.02352151]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.01182232]]
Error is
[[-0.38817768]]
Delta Out Node:
[[-0.0045349]]
Weight Increment:
[[-0.00078198]
 [ 0.00260661]]
Bias Increment:
[[0.01824629]]
State after training rotation:
Weight
[[ 0.17321862]
 [-0.5773954 ]]
bias
[[-4.04176779]]
Seeking
[[0.4]]
'''

我可以看到，当输出小于 0.5 时，出于某种原因，这无论如何都会使输出变低。如果起始输出小于 0.5，它只会学习一个小于起始输出的值。如果起始输出为 0.5 或更大，它只会学习一个同样大于 0.5 的值。然而，我仍然想不出这个问题的解决方案（至少优雅地）。

这是两种争用情况，所以我可以强行修复。但是，我不会知道自己犯了什么错误。

我知道有多种方法可以实现这个网络，甚至在这个 blog 上看到了这个简单得可笑的变体，我仍然无法理解它的数学。但是，在这件事上工作了数周后，我只能假设这是我无法看到的一些小错误。

此修改向解决方案迈出了一步。

在类定义中，以下行已更改。

#derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
#derivError_out = delta_out * layers[-3].T*self._weights[-1]
derivError_out = delta_out * layers[-3].T

变化

当初始输出为 0.5 或更大时，网络可以学习 0 到 1 之间的任何值耶！
当初始输出小于 0.5 时，网络可以学习任何小于“不大于初始输出”的值
- 此行为取决于权重，并且似乎存在基于网络无法学习的权重的上限。当尝试学习大于该限制的值时，它将收敛到 0

【问题讨论】：

标签： python python-3.x neural-network perceptron

【解决方案1】：

立即出现的一个问题是您的 sigmoid 导数似乎不正确。 sigmoid(x)的导数不等于(x)*(1-x)而是sigmoid(x)(1-sigmoid(x))

您可以相应地更改您的实现

def logistic(x, deriv=False):
    """
    If using the derivative, input must be result of logistic
    """
    phi = 1 / (1 + np.exp(-x))

    if deriv:
        return phi * (1 - phi)

    return phi

我看到两件我非常怀疑的主要事情正在做你想做的事情：

您正在优化的错误不是理想的成本函数。
梯度下降最小化了您正在制定的error 术语。
看看当前的公式error = prediction - target，正确运行的梯度优化器所能达到的最佳结果是只给出可能的最小（如果激活允许它甚至是负面的）预测。

建议：使用一些 L-norm 作为误差函数，例如L1-范数
error = | prediction - target |

也许我只是无法正确阅读它，没关系，但重量更新
derivError_out = delta_out * layers[-3].T*self._weights[-1]
看起来很可疑，一旦您构建了错误项delta_out，您想要反向传播对各个权重的贡献仅取决于它们各自的激活layers[-3]。

我确实建议以下内容：（如果您有兴趣，请推导https://imgur.com/gallery/noS4pe4）

def learn(self, input, activation, epoch=10, eta=1, debug=False):
    """ .... optimizing L1-norm .... """
    for e in range(epoch):
        layers = self._forwardProp(input, activation)

        diff = layers[-1] - self._target
        error = np.abs(diff)

        delta_out = np.sign(diff) * activation(layers[-2], deriv=True)
        derivError_out = delta_out * layers[-3].T
        self._weights -= eta * derivError_out
        self._bias -= eta * delta_out

此外：您的学习率eta 似乎相当高；这肯定会发散并产生一个权重爆炸的模型，最终根据初始化产生 0 或 1。

将其调低以实现更稳定的梯度下降。

例如

nn1.learn(x, logistic, 100, eta=1e-2, debug=True)

【讨论】：

我确实猜想这不会解决你所有的问题，现在我看看它。等我弄明白了再回来
关于导数，之前已经计算过的导数重新计算sigmoid是浪费周期。但是，那是我开始时的实现。标准实现基本上要求您输入您想要导数的结果以提高效率。我没有提到它，但更改 eta 只会影响我的测试中的收敛速度。网络继续收敛于相同的值。
y 我目前正在编辑我的答案以解决实际问题
关于周期时间；按照你目前的方式，你无论如何都要为每个电话计算phi = 1/(1+np.exp(- something ))
你可以，但是它仍然令人困惑，并且一旦你开始实现不只是函数值的组合的不同非线性，它可能会严重出错

【解决方案2】：

在互联网上到处都可以找到递增神经网络参数的正确推导。在感知器的情况下，每个偏差都会增加一个“delta”值，该值由它所连接的输出节点确定。我的偏置节点不是简单地实现这一点，而是通过自身和这个“增量”的乘积来增加。

由于我的无能，我把这个表达式写成

delta_out = error * activation(layers[-1], deriv = True)
derivError_bias = delta_out * self._bias[-1].T
self._bias -= eta * derivError_bias

而不是self._bias -= eta * delta_out

网络现在可以学习具有任何随机分配的权重和偏差的任何随机分配的目标。

【讨论】：

^^;耶稣，对我来说似乎也有一段时间了。我在纸上的推导也犯了同样的错误