具有随机输入实现的梯度下降答案

【问题标题】：Gradient descent with random input implementation具有随机输入实现的梯度下降
【发布时间】：2014-11-19 16:42:42
【问题描述】：

我正在尝试在数据集上实现梯度下降。尽管我尝试了一切，但我无法让它发挥作用。所以，我创建了一个测试用例。我正在对随机数据尝试我的代码并尝试调试。

更具体地说，我正在做的是生成 0-1 之间的随机向量和这些向量的随机标签。并尝试过度拟合训练数据。

但是，我的权重向量在每次迭代中变得越来越大。然后，我有无穷大。所以，我实际上并没有学到任何东西。这是我的代码：

import numpy as np
import random

def getRandomVector(n):
   return np.random.uniform(0,1,n)

def getVectors(m, n):
   return [getRandomVector(n) for i in range(n)]

def getLabels(n):
   return [random.choice([-1,1]) for i in range(n)]

def GDLearn(vectors, labels):
   maxIterations = 100
   stepSize = 0.01

   w = np.zeros(len(vectors[0])+1)
   for i in range(maxIterations):
      deltaw = np.zeros(len(vectors[0])+1)
      for i in range(len(vectors)):
         temp = np.append(vectors[i], -1)
         deltaw += ( labels[i] - np.dot(w, temp) ) * temp
      w = w + ( stepSize * (-1 * deltaw) )
   return w

vectors = getVectors(100, 30)
labels = getLabels(100)

w = GDLearn(vectors, labels)
print w

我将 LMS 用于损失函数。因此，在所有迭代中，我的更新如下，

其中 w^i 是第 i 个权重向量，R 是 stepSize，E(w^i) 是损失函数。

这是我的损失函数。 (LMS)

这是我推导损失函数的方法，

现在，我的问题是：

我是否应该期望在这种使用梯度下降的随机场景中获得良好的结果？（理论上的界限是多少？）
如果是，我的实现中有什么错误？

PS：我尝试了其他几个maxIterations 和stepSize 参数。还是行不通。 PS2：这是我可以在这里提出问题的最佳方式。对不起，如果问题太具体。但这让我发疯了。我真的很想学习这个问题。

【问题讨论】：

您的代码有明显的错误。我会尽快回复。
我在某处丢失了减号。问题解决了。还是谢谢

标签： python machine-learning gradient-descent

【解决方案1】：

您的代码有几个错误：

在GetVectors()方法中，你实际上并没有使用输入变量m；
在GDLearn() 方法中，您有一个双循环，但您使用相同的变量i 作为两个循环中的循环变量。（我猜这个逻辑还是对的，但令人困惑）。
预测错误 (labels[i] - np.dot(w, temp)) 的符号错误。
步长很重要。如果我使用 0.01 作为步长，则每次迭代的成本都会增加。将其更改为 0.001 解决了该问题。

这是我根据您的原始代码修改后的代码。

import numpy as np
import random

def getRandomVector(n):
   return np.random.uniform(0,1,n)

def getVectors(m, n):
   return [getRandomVector(n) for i in range(m)]

def getLabels(n):
   return [random.choice([-1,1]) for i in range(n)]

def GDLearn(vectors, labels):
   maxIterations = 100
   stepSize = 0.001

   w = np.zeros(len(vectors[0])+1)
   for iter in range(maxIterations):
      cost = 0
      deltaw = np.zeros(len(vectors[0])+1)
      for i in range(len(vectors)):
         temp = np.append(vectors[i], -1)
         prediction_error = np.dot(w, temp) - labels[i]
         deltaw += prediction_error * temp
         cost += prediction_error**2
      w = w -  stepSize * deltaw
      print 'cost at', iter, '=', cost
   return w

vectors = getVectors(100, 30)
labels = getLabels(100)

w = GDLearn(vectors, labels)
print w

运行结果——您可以看到每次迭代的成本都在下降，但收益却在递减。

cost at 0 = 100.0
cost at 1 = 99.4114482617
cost at 2 = 98.8476022685
cost at 3 = 98.2977744556
cost at 4 = 97.7612851154
cost at 5 = 97.2377571222
cost at 6 = 96.7268325883
cost at 7 = 96.2281642899
cost at 8 = 95.7414151147
cost at 9 = 95.2662577529
cost at 10 = 94.8023744037
......
cost at 90 = 77.367904046
cost at 91 = 77.2744249433
cost at 92 = 77.1823702888
cost at 93 = 77.0917090883
cost at 94 = 77.0024111475
cost at 95 = 76.9144470493
cost at 96 = 76.8277881325
cost at 97 = 76.7424064707
cost at 98 = 76.6582748518
cost at 99 = 76.5753667579
[ 0.16232142 -0.2425511   0.35740632  0.22548442  0.03963853  0.19595213
  0.20080207 -0.3921798  -0.0238925   0.13097533 -0.1148932  -0.10077534
  0.00307595 -0.30111942 -0.17924479 -0.03838637 -0.23938181  0.1384443
  0.22929163 -0.0132466   0.03325976 -0.31489526  0.17468025  0.01351012
 -0.25926117  0.09444201  0.07637793 -0.05940019  0.20961315  0.08491858
  0.07438357]

【讨论】：