使用 NLS2 和 NLS 进行 R 曲线拟合（多重指数）答案

【问题标题】：R curve fitting (multiple exponential) with NLS2 and NLS使用 NLS2 和 NLS 进行 R 曲线拟合（多重指数）
【发布时间】：2015-11-12 01:21:45
【问题描述】：

我在让特定曲线适合 R 时遇到了一些困难，但它在商业曲线拟合程序中工作得非常好。

数据应该适合的公式是：

y(t) = A * exp(-a*(t)) + B * exp(-b*(t)) - (A+B) * exp(-c*(t))

因此，为此我想使用 R 中内置的非线性回归。我已经断断续续地做了一天，只是无法让它发挥作用。问题完全在于初始值，所以我使用 NLS2 蛮力找到初始值。

y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,0.00355375,
0.00308875,0.0028925,0.00266375)
t <- c(0,3,6,12,24,48,72,96,120,144,168)
df <- data.frame(t,y)
plot(t,y);
#Our model:
fo <- y ~ f1*exp(-k1*t)+f2*exp(-k2*t)-(f1+f2)*exp(-k3*t);

#Define the outer boundaries to search for initial values
grd <- data.frame(f1=c(0,1),
              f2=c(0,1),
              k1=c(0,2),
              k2=c(0,2),
              k3=c(0,0.7));

#Do the brute-force
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "brute-force",
        control=list(maxiter=20000))
fit
coef(fit)
final <- nls(fo, data=df, start=as.list(coef(fit)))

它应该给出的值是：

f1  0.013866
f2  0.005364
k1  0.063641
k2  0.004297
k3  0.615125

尽管即使迭代值很高，我也只是得到了无意义的回报。我显然做错了什么，但我看不到它

根据@Roland 的评论进行编辑：

您提出的使用线性方法近似 k1-3 的方法似乎适用于某些数据集，但不适用于所有数据集。以下是我现在根据您的输入使用的代码。

#Oral example:
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168);
#IV example:
#y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,
#0.00355375,0.00308875,0.0028925,0.00266375)
#t <- c(0,3,6,12,24,48,72,96,120,144,168)
DF <- data.frame(y, t)
fit1 <- nls(y ~ cbind(exp(-k1*t), exp(-k2*t), exp(-k3*t)), algorithm = "plinear", data = DF,
            start = list(k1 = 0.002, k2 = 0.02, k3= 0.2))
k1_predict <-summary(fit1)$coefficients[1,1]
k2_predict <-summary(fit1)$coefficients[2,1]
k3_predict <-summary(fit1)$coefficients[3,1]
fo <- y ~ f1*exp(-k1*t)+f2*exp(-k2*t)-(f1+f2)*exp(-k3*t);
fit2 <- nls(fo, data = DF, 
            start = list(k1 = k1_predict, k2 = k2_predict, k3 = k3_predict, f1 = 0.01, f2 = 0.01))
summary(fit2);
plot(t,y);
curve(predict(fit2, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")

oral_example fit

@G。格洛腾迪克与 Roland 的建议类似，您的建议也非常出色，因为它能够拟合某些数据集，但与其他数据集存在冲突。下面的代码基于您的输入，并以奇异梯度矩阵退出。

#Oral example:
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168);
#IV example:
#y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,
#0.00355375,0.00308875,0.0028925,0.00266375)
#t <- c(0,3,6,12,24,48,72,96,120,144,168)
df <- data.frame(y, t)
grd <- data.frame(f1=c(0,1),
              f2=c(0,1),
              k1=c(0,2),
              k2=c(0,2),
              k3=c(0,0.7));
set.seed(123)
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "random",
        control = nls.control(maxiter = 100000))
nls(fo, df, start = coef(fit), alg = "port", lower = 0)
plot(t,y);
curve(predict(nls, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")

【问题讨论】：

标签： r curve nls

【解决方案1】：

我会首先做一个不限制线性参数的部分线性拟合，以获得指数参数的良好起始值以及关于线性参数大小的一些想法：

DF <- data.frame(y, t)
fit1 <- nls(y ~ cbind(exp(-k1*t), exp(-k2*t), exp(-k3*t)), algorithm = "plinear", data = DF,
            start = list(k1 = 0.002, k2 = 0.02, k3= 0.2))
summary(fit1)
#Formula: y ~ cbind(exp(-k1 * t), exp(-k2 * t), exp(-k3 * t))
#
#Parameters:
#        Estimate Std. Error t value Pr(>|t|)    
#k1     0.0043458  0.0010397   4.180 0.008657 ** 
#k2     0.0639379  0.0087141   7.337 0.000738 ***
#k3     0.6077646  0.0632586   9.608 0.000207 ***
#.lin1  0.0053968  0.0006637   8.132 0.000457 ***
#.lin2  0.0139231  0.0008694  16.014 1.73e-05 ***
#.lin3 -0.0193145  0.0010631 -18.168 9.29e-06 ***

然后你可以拟合你的实际模型：

fit2 <- nls(fo, data = DF, 
            start = list(k1 = 0.06, k2 = 0.004, k3 = 0.6, f1 = 0.01, f2 = 0.01))
summary(fit2)  
#Formula: y ~ f1 * exp(-k1 * t) + f2 * exp(-k2 * t) - (f1 + f2) * exp(-k3 * t)
#
#Parameters:
#    Estimate Std. Error t value Pr(>|t|)    
#k1 0.0639344  0.0079538   8.038 0.000198 ***
#k2 0.0043456  0.0009492   4.578 0.003778 ** 
#k3 0.6078929  0.0575616  10.561 4.24e-05 ***
#f1 0.0139226  0.0007934  17.548 2.20e-06 ***
#f2 0.0053967  0.0006059   8.907 0.000112 ***         

curve(predict(fit2, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")

请注意，可以通过切换指数项（即 kn 起始值的顺序）轻松重新参数化此模型，这可能导致 f1 和 f2 的估计不同，但基本上同样合身。

【讨论】：

我已根据您的意见对起始帖进行了修改。您的建议似乎部分解决了这个问题，尽管在某些数据集上我仍然很难找到与 R 的完美契合。如果您可以再看一眼，它真的可以帮助我的研究
您有 (i) 一个复杂的五参数模型和 (ii) 在关键时间范围内（“峰值”附近）支持该模型的数据点非常少。我相信第二个数据点的小偏差（例如，由于测量不确定性）对拟合有很大影响，并可能导致收敛问题。
好的。非常感谢你。您的贡献帮助我更好地模拟了我的 IV/口腔数据 :)

【解决方案2】：

有了这么多参数，我会使用算法 = "random" 而不是 "brute"。如果我们这样做，那么以下给出的结果接近问题中的结果（由于模型参数的对称性而导致参数的排列）：

set.seed(123)
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "random",
        control = nls.control(maxiter = 20000))
nls(fo, df, start = coef(fit), alg = "port", lower = 0)

给予：

Nonlinear regression model
  model: y ~ f1 * exp(-k1 * t) + f2 * exp(-k2 * t) - (f1 + f2) * exp(-k3 * t)
   data: df
      f1       f2       k1       k2       k3 
0.005397 0.013923 0.004346 0.063934 0.607893 
 residual sum-of-squares: 2.862e-07

Algorithm "port", convergence message: relative convergence (4)

添加

上述方法的一个变体是在 minpack.lm 包中使用 nlsLM 而不是 nls，并使用样条曲线来获取数据集中的更多点。代替 nls 行尝试以下操作。它仍然会收敛：

library(minpack.lm)
t_s <- with(df, min(t):max(t))
df_s <- setNames(data.frame(spline(df$t, df$y, xout = t_s)), c("t", "y"))
nlsLM(fo, df_s, start = coef(fit), lower = rep(0,5), control = nls.control(maxiter = 1024))

在口头示例中也是如此：

set.seed(123)
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168)
DF <- data.frame(y, t)
grd <- data.frame(f1=c(0,1), f2=c(0,1), k1=c(0,2), k2=c(0,2), k3=c(0,0.7))
fit <- nls2(fo,
        data=DF,
        start = grd,
        algorithm = "random",
        control = nls.control(maxiter = 20000))

library(minpack.lm)
t_s <- with(DF, min(t):max(t))
df_s <- setNames(data.frame(spline(DF$t, DF$y, xout = t_s)), c("t", "y"))
nlsLM(fo, df_s, start = coef(fit), lower = rep(0,5), control = nls.control(maxiter = 1024))

【讨论】：

我已根据您的意见对原始问题进行了修改。您建议使用随机种子进行暴力破解似乎在某些数据集上非常有效，尽管在其他数据集上我很难适应。如果您有其他提示，它真的会帮助我
谢谢您，您的贡献对我的数据建模很有帮助:)
您应该勾选您喜欢的那个，而不是使用 cmets 来表示感谢。