在正则化数据上使用 SciPy fmin_bfgs() 发出警告答案

【问题标题】：Warning using SciPy fmin_bfgs() on regularized data在正则化数据上使用 SciPy fmin_bfgs() 发出警告
【发布时间】：2018-10-12 22:51:11
【问题描述】：

我正在使用下一个 cost() 和 gradient() 正则化函数：

def cost(theta, x, y, lam):
    theta = theta.reshape(1, len(theta))
    predictions = sigmoid(np.dot(x, np.transpose(theta))).reshape(len(x), 1)

    regularization = (lam / (len(x) * 2)) * np.sum(np.square(np.delete(theta, 0, 1)))

    complete = -1 * np.dot(np.transpose(y), np.log(predictions)) \
           - np.dot(np.transpose(1 - y), np.log(1 - predictions))
    return np.sum(complete) / len(x) + regularization


def gradient(theta, x, y, lam):
    theta = theta.reshape(1, len(theta))
    predictions = sigmoid(np.dot(x, np.transpose(theta))).reshape(len(x), 1)

    theta_without_intercept = theta.copy()
    theta_without_intercept[0, 0] = 0
    assert(theta_without_intercept.shape == theta.shape)
    regularization = (lam / len(x)) * np.sum(theta_without_intercept)

    return np.sum(np.multiply((predictions - y), x), 0) / len(x) + regularization

有了这些功能和scipy.optimize.fmin_bfgs()，我得到了下一个输出（这几乎是正确的）：

Starting loss value: 0.69314718056 
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 0.208444
         Iterations: 8
         Function evaluations: 51
         Gradient evaluations: 39
7.53668131651e-08
Trained loss value: 0.208443907192

下面的正则化公式。如果我在scipy.optimize.fmin_bfgs() 上方评论正则化输入效果很好，并正确返回局部最优值。

任何想法为什么？

更新：

在附加 cmets 之后，我更新了成本和梯度正则化（在上面的代码中）。但是这个警告仍然出现（上面的新输出）。 scipy check_grad函数返回下一个值：7.53668131651e-08。

更新 2：

我正在使用 set UCI Machine Learning Iris 数据。并基于分类模型One-vs-All 训练Iris-setosa 的第一个结果。

【问题讨论】：

lam的值是多少？在你的成本函数中，我认为它应该是 (lam / len(x) / 2)。
用于注册的 Lambda。是1。我附上了Reg的公式。
我认为您的问题会受益于包含 MVCE 显示您传递给 fmin_bfgs 的参数和预期结果。
@JacquesGaudin 嘿，我添加了带有数据集和值的源。
相关：stackoverflow.com/questions/24767191/…

标签： python machine-learning scipy

【解决方案1】：

当您尝试执行 L2 正则化时，您应该从

修改成本函数中的值

regularization = (lam / len(x) * 2) * np.sum(np.square(np.delete(theta, 0, 1)))

到

regularization = (lam / (len(x) * 2)) * np.sum(np.square(np.delete(theta, 0, 1)))

另外，正则化的梯度部分应该与参数theta的向量具有相同的形状。因此，我宁愿认为正确的值是

theta_without_intercept = theta.copy()
theta_without_intercept[0] = 0 #  You are not penalizing the intercept in your cost function, i.e. theta_0
assert(theta_without_intercept.shape == theta.shape)
regularization = (lam / len(x)) * theta_without_intercept

否则，渐变将不正确。然后您可以使用scipy.optimize.check_grad() 函数检查您的渐变是否正确。

【讨论】：

嘿，我更新了我的cost 和grad 函数，但仍然面临这个警告（我还使用了scipy check_grad 函数，它返回下一个值：7.53668131651e-08。

【解决方案2】：

问题出在我的微积分中，由于某种原因，我在正则化中对 theta 值求和：regularization = (lam / len(x)) * np.sum(theta_without_intercept)。在这个阶段我们不需要 np.sum 正则化值。这将为每个 theta 和下一个预测损失产生 varegae 正则化。无论如何，感谢您的帮助。

梯度法：

def gradient(theta, x, y, lam):
    theta_len = len(theta)
    theta = theta.reshape(1, theta_len)

    predictions = sigmoid(np.dot(x, np.transpose(theta))).reshape(len(x), 1)

    theta_wo_bias = theta.copy()
    theta_wo_bias[0, 0] = 0

    assert (theta_wo_bias.shape == theta.shape)
    regularization = np.squeeze(((lam / len(x)) *
                  theta_wo_bias).reshape(theta_len, 1))

    return np.sum(np.multiply((predictions - y), x), 0) / len(x) + regularization

输出：

Starting loss value: 0.69314718056 
Optimization terminated successfully.
         Current function value: 0.201681
         Iterations: 30
         Function evaluations: 32
         Gradient evaluations: 32
7.53668131651e-08
Trained loss value: 0.201680992316

【讨论】：