L-BFGS-B 代码，Scipy (sciopt.fmin_l_bfgs_b(func, init_guess, maxiter=10, bounds=list(bounds), disp=1, iprint=101))答案

【问题标题】：L-BFGS-B code, Scipy (sciopt.fmin_l_bfgs_b(func, init_guess, maxiter=10, bounds=list(bounds), disp=1, iprint=101))L-BFGS-B 代码，Scipy (sciopt.fmin_l_bfgs_b(func, init_guess, maxiter=10, bounds=list(bounds), disp=1, iprint=101))
【发布时间】：2020-09-10 16:47:26
【问题描述】：

我正在使用 L-BFGS-B 优化器来查找函数的最小值。这将帮助我计算函数的清晰度。但是，我不确定以下消息是否被视为正常消息（即我的程序有问题还是此消息是典型的？）见下文：

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =     28149514     M =           10

At X0         0 variables are exactly at the bounds
^[[C
At iterate    0    f= -3.59325D+00    |proj g|=  2.10249D-03

At iterate    1    f= -2.47853D+01    |proj g|=  4.20499D-03

 Bad direction in the line search;
   refresh the lbfgs memory and restart the iteration.

At iterate    2    f= -2.53202D+01    |proj g|=  4.17686D-03

At iterate    3    f= -2.53202D+01    |proj g|=  4.17686D-03

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
*****      3     43 ******     0 *****   4.177D-03  -2.532D+01
  F =  -25.320247650146484     

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH             

 Warning:  more than 10 function and gradient
   evaluations in the last line search.  Termination
   may possibly be caused by a bad search direction.

无论如何，我得到了以下清晰度，这与我试图复制的论文相对一致：只是我有点担心上面的信息。张量(473.0201)

这是我计算锐度的代码：

def get_sharpness(data_loader, model, criterion, epsilon, manifolds=0):

  # extract current x0
  x0 = None
  for p in model.parameters():
    if x0 is None:
      x0 = p.data.view(-1)
    else:
      x0 = torch.cat((x0, p.data.view(-1)))
  x0 = x0.cpu().numpy()

  # get current f_x
  f_x0, _ = get_minus_cross_entropy(x0, data_loader, model, criterion)
  f_x0 = -f_x0
  logging.info('min loss f_x0 = {loss:.4f}'.format(loss=f_x0))

  # find the minimum
  if 0==manifolds:
    x_min = np.reshape(x0 - epsilon * (np.abs(x0) + 1), (x0.shape[0], 1))
    x_max = np.reshape(x0 + epsilon * (np.abs(x0) + 1), (x0.shape[0], 1))
    bounds = np.concatenate([x_min, x_max], 1)
    func = lambda x: get_minus_cross_entropy(x, data_loader, model, criterion, training=True)
    init_guess = x0
  else:
    warnings.warn("Small manifolds may not be able to explore the space.")
    assert(manifolds<=x0.shape[0])
    #transformer = rp.GaussianRandomProjection(n_components=manifolds)
    #transformer.fit(np.random.rand(manifolds, x0.shape[0]))
    #A_plus = transformer.components_
    #A = np.linalg.pinv(A_plus)
    A_plus = np.random.rand(manifolds, x0.shape[0])*2.-1.
    # normalize each column to unit length
    A_plus_norm = np.linalg.norm(A_plus, axis=1)
    A_plus = A_plus / np.reshape(A_plus_norm, (manifolds,1))
    A = np.linalg.pinv(A_plus)
    abs_bound = epsilon * (np.abs(np.dot(A_plus, x0))+1)
    abs_bound = np.reshape(abs_bound, (abs_bound.shape[0], 1))
    bounds = np.concatenate([-abs_bound, abs_bound], 1)
    def func(y):
      floss, fg = get_minus_cross_entropy(x0 + np.dot(A, y), data_loader, model, criterion, training=True)
      return floss, np.dot(np.transpose(A), fg)
    #func = lambda y: get_minus_cross_entropy(x0+np.dot(A, y), data_loader, model, criterion, training=True)
    init_guess = np.zeros(manifolds)

  #rand_selections = (np.random.rand(bounds.shape[0])+1e-6)*0.99
  #init_guess = np.multiply(1.-rand_selections, bounds[:,0])+np.multiply(rand_selections, bounds[:,1])

  minimum_x, f_x, d = sciopt.fmin_l_bfgs_b(func, init_guess, maxiter=10, bounds=list(bounds), disp=1, iprint=101)
    #factr=10.,
    #pgtol=1.e-12,

  f_x = -f_x
  logging.info('max loss f_x = {loss:.4f}'.format(loss=f_x))
  sharpness = (f_x - f_x0)/(1+f_x0)*100
  print(sharpness)

  # recover the model
  x0 = torch.from_numpy(x0).float()
  x0 = x0.cuda()
  x_start = 0
  for p in model.parameters():
      psize = p.data.size()
      peltnum = 1
      for s in psize:
          peltnum *= s
      x_part = x0[x_start:x_start + peltnum]
      p.data = x_part.view(psize)
      x_start += peltnum

  return sharpness

取自此存储库： https://github.com/wenwei202/smoothout/blob/master/measure_sharpness.py

我担心精确度。

【问题讨论】：

标签： python numpy tensorflow scipy pytorch

【解决方案1】：

首先，l-bfgs-b 只会给出凸函数的全局最小值。
消息收敛：REL_REDUCTION_OF_F_ 您收到的警告说在线搜索中有很多函数/梯度评估 - 当您在非凸函数上使用 l-bfgs-b 时，通常会发生这种情况。因此，如果您要最小化的东西是非凸的（看起来可能只是通过查看代码），我会说这是正常的。

【讨论】：