Levenberg–Marquardt算法

Levenberg-Marquardt 算法

最优估计算法中通常的做法都是写一个代价函数，并迭代计算最小化代价函数。
解决非线性最小二乘问题的方法有高斯牛顿法等，下面介绍Levenberg-Marquardt 算法：

1. 算法描述

以下引用自维基百科

问题描述

假设 $f$ 是一个从 $\Re ^{m} \rightarrow \Re ^{n}$ 的非线性映射，也就是说 $\mathbf {P} \in \Re ^{m}$ 且 $\mathbf {X} \in \Re ^{n}$ 那么:
$f(\mathbf {P} )=\mathbf {X}$

而我们的目的就是希望任意给定一个 $\mathbf {x}$ 以及合理的初始值 $\mathbf {p} _{0}$ ，我们能找到一个 $\mathbf {p}^{+}$ ，使得 $\mathbf {\epsilon } ^{T}\mathbf {\epsilon }$ 尽量小（局部极小），其中 $\mathbf {\epsilon } =f(\mathbf {p} ^{+})-\mathbf {x}$ 。

解法

像大多数最小化的方法一样，这是一个迭代的方法。首先根据泰勒展开式我们能把 $f(\mathbf {p} +\mathbf {\delta _{p}} )$ 写为下面的近似，这有两个好处：第一是线性、第二是只需要一阶微分。

$f(\mathbf {p} +\mathbf {\delta _{p}} )\approx f(\mathbf {p} )+\mathbf {J\delta _{p}}$

其中 $\mathbf {J}$ 是 $f$ 的雅可比矩阵。对于每次的迭代我们这么做：假设这次 iteration 的点是 ${\mathbf {p}}_{k}$ ，我们要找到一个 $|{\mathbf {x}}-f({\mathbf {p}}+{\mathbf {\delta }}_{{{\mathbf {p}},k}})|\approx |{\mathbf {x}}-f({\mathbf {p}})-{\mathbf {J{\mathbf {\delta }}_{{{\mathbf {p}},k}}}}|=|{\mathbf {\epsilon }}_{{k}}-{\mathbf {J{\mathbf {\delta }}_{{{\mathbf {p}},k}}}}|$ 最小。根据投影公式我们知道当下面式子被满足的时候能有最小误差：
$({\mathbf {J}}^{T}{\mathbf {J}}){\mathbf {\delta _{p}}}={\mathbf {J}}^{T}{\mathbf {\epsilon }}_{{k}}$

我们将这个公式略加修改得到：
$[\mu {\mathbf {I}}+({\mathbf {J}}^{T}{\mathbf {J}})]{\mathbf {\delta _{p}}}={\mathbf {J}}^{T}{\mathbf {\epsilon }}_{{k}}$

就是莱文贝格－马夸特方法。如此一来 $\mu$ 大的时候这种算法会接近最速下降法，小的时候会接近高斯-牛顿方法。为了确保每次 $\mathbf {\epsilon }$ 长度的减少，我们这么作：先采用一个小的 $\mu$ ，如果 ${\mathbf {\epsilon }}$ 长度变大就增加 $\mu$ 。

这个算法当以下某些条件达到时结束迭代：

如果发现 $\mathbf {\epsilon }$ 长度变化小于特定的给定值就结束
发现 ${\mathbf {\delta _{p}}}$ 变化小于特定的给定值就结束。
到达了迭代的上限设定就结束。

2. Matlab 中的LM函数

matlab中 $lsqonlin$ 函数为LM算法的函数，即解决非线性最小二乘问题（非线性数据拟合的算法）

官方给出的例子

指数型拟合
拟合近似为 $y=exp(-1.3t)+\epsilon$ 曲线上的点，其中 $t$ 的变化范围为[0,3].

%函数目标：给定待拟合的数据d和y，寻找能够拟合这些点的指数型函数
%----产生数据----------
rng default
d = linspace(0,3,100);
y = exp(-1.3*d)+0.05*randn(size(d));
%------给定拟合函数形式------
fun = @(r)exp(-d*r)-y;%这里认为最小化的函数为fun，变量为r
%------给定初始值，进行迭代，计算最优解----------
x0 = 4；
x = lsqnonlin(fun,x0);
%---------绘制拟合结果--------------
plot(d,y,'ko',d,exp(-x*d),'b-')
legend('Data','Best fit')
xlabel('t')
ylabel('exp(-tx)')

对含有边界条件的问题进行拟合
目的：在一些拟合参数含有边界条件时，寻找最好的拟合模型
问题：找到最优的中心点b和缩放值a的值，使函数 $aexp(-t)exp(-exp(-(t-b)))$ 和标准正态密度函数 $\frac{1}{\sqrt{2\pi}}exp(-t^2/2)$ 的拟合程度达到最高。

%-------标准正态分布函数形式---------
t = linspace(-4,4);
y = 1/sqrt(2*pi)*exp(-t.^2/2);

% ---写cost function，即拟合函数与标准正态函数之差----
% Create a function that evaluates the difference between the centered and scaled function from the normal |y|,
% with |x(1)| as the scaling $a$ and |x(2)| as the centering $b$.
fun = @(x)x(1)*exp(-t).*exp(-exp(-(t-x(2)))) - y;

% Find the optimal fit starting from |x0| = |[1/2,0]|, with the scaling $a$
% between 1/2 and 3/2, and the centering $b$ between -1 and 3.
lb = [1/2,-1]; % a的最小值为1/2.b的最小值为-1
ub = [3/2,3]; %a的最大值为3/2.b的最大值为3
x0 = [1/2,0];%将ab的初始值分别设为1/2 和0
x = lsqnonlin(fun,x0,lb,ub)  %计算local minimum
%%
% Plot the two functions to see the quality of the fit.
plot(t,y,'r-',t,fun(x)+y,'b-')
xlabel('t')
legend('Normal density','Fitted function')

非线性最小二乘拟合（使用非默认的算法）Nonlinear Least Squares with Non default Options

% 该例子中展示了使用不同lsqnonlin算法得到的函数拟合结果。
% -----给定x和y的值，并给出拟合函数形式ydata = x(1)exp（x(2)*xdata）}---------
xdata = ...
 [0.9 1.5 13.8 19.8 24.1 28.2 35.2 60.3 74.6 81.3];
ydata = ...
 [455.2 428.6 124.1 67.3 43.2 28.1 13.1 -0.4 -1.3 -1.5];
%-----创建cost function---------
fun = @(x)x(1)*exp(x(2)*xdata)-ydata;
%-----给定参数初始值------
x0 = [100,-1];
%-----选择优化函数算法1-----
options = optimoptions(@lsqnonlin,'Algorithm','trust-region-reflective');
%------计算最优解1--------
x = lsqnonlin(fun,x0,[],[],options)；
%-----选择优化函数2-----
options.Algorithm = 'levenberg-marquardt';
x = lsqnonlin(fun,x0,[],[],options)
%---绘图，发现利用两种算法得到的拟合线是相同的----
plot(xdata,ydata,'ko')
hold on
tlist = linspace(xdata(1),xdata(end));
plot(tlist,x(1)*exp(x(2)*tlist),'b-')
xlabel xdata
ylabel ydata
title('Exponential Fit to Data')
legend('Data','Exponential Fit')
hold off

非线性最小二乘拟合以及对应的residual norm
目的：计算函数 $\sum_{k = 1}^{10}{(2+2k-e^{k*x_1}-e^{(k*x_2)})^2}$ 的最小值,并计算该函数值的大小。
思路： $lsqnonlin$ 假设平方和的的大小并不能够在用户定义函数中显式地表示，相反的，传递给 $lsqnonlin$ 的函数应该是可以计算的向量值函数（vector-valued function）： ${F_k(x) = 2+2k-e^{k*x_1}-e^{k*x_2}}$ for k =1:10(F有10个值）

function F = myfun(x)
k = 1:10;
F = 2 + 2*k-exp(k*x(1))-exp(k*x(2));
x0 = [0.3,0.4];
[x,resnorm] = lsqnonlin(@myfun,x0);

以下引自博客：https://blog.csdn.net/tina_ttl/article/details/51203433
最小化函数 $\sum_{k = 1}^{10}{(2+2k-e^{k*x_1}-e^{(k*x_2)})^2}$
（表示不理解jacobian 函数）

without jacobian

k = 0:10;
F = @(x)2+2*k-exp(k*x(1))-exp(k*x(2));
x0 = [0.3,0.4];
[x,resnorm,res,eflag,output] = lsqnonlin(F,x0);

with jacobian

% 函数输入是向量x，返回值是目标函数值F和Jacobian
function [F,J] = myfun(x)
k = 1:10;
F = 2 + 2*k-exp(k*x(1))-exp(k*x(2));
if nargout > 1  %如果返回值为2个
    J = zeros(10,2);
    J(k,1) = -k.*exp(k*x(1));
    J(k,2) = -k.*exp(k*x(2));
end
%----设置solver--------
opts = optimoptions(@lsqnonlin,’SpecifyObjectiveGradient’,true);
%----利用lsqnonlin函数求解优化问题
x0 = [0.3 0.4]; % Starting guess
[x,resnorm,res,eflag,output2] = lsqnonlin(@myfun,x0,[],[],opts);
--------------------- 
作者：tina_ttl 
来源：CSDN 
原文：https://blog.csdn.net/tina_ttl/article/details/51203433 
版权声明：本文为博主原创文章，转载请附上博文链接！

3. 文献中LM算法的使用场景

Hemmer, F., L. C.-Labonnote, F. Parol, G. Brogniez, B. Damiri & T. Podvin (2019) An algorithm to retrieve ice water content profiles in cirrus clouds from the synergy of ground-based lidar and thermal infrared radiometer measurements. Atmospheric Measurement Techniques, 12, 1545-1568.

上述文献中inversion model即利用最优估计算法，并用LM算法计算得到了最优解。（还没有看明白，之后博客中再详细讲）