【问题标题】:R curve fitting (multiple exponential) with NLS2 and NLS使用 NLS2 和 NLS 进行 R 曲线拟合(多重指数)
【发布时间】:2015-11-12 01:21:45
【问题描述】:

我在让特定曲线适合 R 时遇到了一些困难,但它在商业曲线拟合程序中工作得非常好。

数据应该适合的公式是:

y(t) = A * exp(-a*(t)) + B * exp(-b*(t)) - (A+B) * exp(-c*(t))

因此,为此我想使用 R 中内置的非线性回归。我已经断断续续地做了一天,只是无法让它发挥作用。问题完全在于初始值,所以我使用 NLS2 蛮力找到初始值。

y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,0.00355375,
0.00308875,0.0028925,0.00266375)
t <- c(0,3,6,12,24,48,72,96,120,144,168)
df <- data.frame(t,y)
plot(t,y);
#Our model:
fo <- y ~ f1*exp(-k1*t)+f2*exp(-k2*t)-(f1+f2)*exp(-k3*t);

#Define the outer boundaries to search for initial values
grd <- data.frame(f1=c(0,1),
              f2=c(0,1),
              k1=c(0,2),
              k2=c(0,2),
              k3=c(0,0.7));

#Do the brute-force
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "brute-force",
        control=list(maxiter=20000))
fit
coef(fit)
final <- nls(fo, data=df, start=as.list(coef(fit)))

它应该给出的值是:

f1  0.013866
f2  0.005364
k1  0.063641
k2  0.004297
k3  0.615125

尽管即使迭代值很高,我也只是得到了无意义的回报。我显然做错了什么,但我看不到它

根据@Roland 的评论进行编辑:

您提出的使用线性方法近似 k1-3 的方法似乎适用于某些数据集,但不适用于所有数据集。以下是我现在根据您的输入使用的代码。

#Oral example:
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168);
#IV example:
#y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,
#0.00355375,0.00308875,0.0028925,0.00266375)
#t <- c(0,3,6,12,24,48,72,96,120,144,168)
DF <- data.frame(y, t)
fit1 <- nls(y ~ cbind(exp(-k1*t), exp(-k2*t), exp(-k3*t)), algorithm = "plinear", data = DF,
            start = list(k1 = 0.002, k2 = 0.02, k3= 0.2))
k1_predict <-summary(fit1)$coefficients[1,1]
k2_predict <-summary(fit1)$coefficients[2,1]
k3_predict <-summary(fit1)$coefficients[3,1]
fo <- y ~ f1*exp(-k1*t)+f2*exp(-k2*t)-(f1+f2)*exp(-k3*t);
fit2 <- nls(fo, data = DF, 
            start = list(k1 = k1_predict, k2 = k2_predict, k3 = k3_predict, f1 = 0.01, f2 = 0.01))
summary(fit2);
plot(t,y);
curve(predict(fit2, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")

oral_example fit

@G。格洛腾迪克 与 Roland 的建议类似,您的建议也非常出色,因为它能够拟合某些数据集,但与其他数据集存在冲突。下面的代码基于您的输入,并以奇异梯度矩阵退出。

#Oral example:
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168);
#IV example:
#y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,
#0.00355375,0.00308875,0.0028925,0.00266375)
#t <- c(0,3,6,12,24,48,72,96,120,144,168)
df <- data.frame(y, t)
grd <- data.frame(f1=c(0,1),
              f2=c(0,1),
              k1=c(0,2),
              k2=c(0,2),
              k3=c(0,0.7));
set.seed(123)
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "random",
        control = nls.control(maxiter = 100000))
nls(fo, df, start = coef(fit), alg = "port", lower = 0)
plot(t,y);
curve(predict(nls, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")

【问题讨论】:

    标签: r curve nls


    【解决方案1】:

    我会首先做一个不限制线性参数的部分线性拟合,以获得指数参数的良好起始值以及关于线性参数大小的一些想法:

    DF <- data.frame(y, t)
    fit1 <- nls(y ~ cbind(exp(-k1*t), exp(-k2*t), exp(-k3*t)), algorithm = "plinear", data = DF,
                start = list(k1 = 0.002, k2 = 0.02, k3= 0.2))
    summary(fit1)
    #Formula: y ~ cbind(exp(-k1 * t), exp(-k2 * t), exp(-k3 * t))
    #
    #Parameters:
    #        Estimate Std. Error t value Pr(>|t|)    
    #k1     0.0043458  0.0010397   4.180 0.008657 ** 
    #k2     0.0639379  0.0087141   7.337 0.000738 ***
    #k3     0.6077646  0.0632586   9.608 0.000207 ***
    #.lin1  0.0053968  0.0006637   8.132 0.000457 ***
    #.lin2  0.0139231  0.0008694  16.014 1.73e-05 ***
    #.lin3 -0.0193145  0.0010631 -18.168 9.29e-06 ***
    

    然后你可以拟合你的实际模型:

    fit2 <- nls(fo, data = DF, 
                start = list(k1 = 0.06, k2 = 0.004, k3 = 0.6, f1 = 0.01, f2 = 0.01))
    summary(fit2)  
    #Formula: y ~ f1 * exp(-k1 * t) + f2 * exp(-k2 * t) - (f1 + f2) * exp(-k3 * t)
    #
    #Parameters:
    #    Estimate Std. Error t value Pr(>|t|)    
    #k1 0.0639344  0.0079538   8.038 0.000198 ***
    #k2 0.0043456  0.0009492   4.578 0.003778 ** 
    #k3 0.6078929  0.0575616  10.561 4.24e-05 ***
    #f1 0.0139226  0.0007934  17.548 2.20e-06 ***
    #f2 0.0053967  0.0006059   8.907 0.000112 ***         
    
    curve(predict(fit2, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")
    

    请注意,可以通过切换指数项(即 kn 起始值的顺序)轻松重新参数化此模型,这可能导致 f1f2 的估计不同,但基本上同样合身。

    【讨论】:

    • 我已根据您的意见对起始帖进行了修改。您的建议似乎部分解决了这个问题,尽管在某些数据集上我仍然很难找到与 R 的完美契合。如果您可以再看一眼,它真的可以帮助我的研究
    • 您有 (i) 一个复杂的五参数模型和 (ii) 在关键时间范围内(“峰值”附近)支持该模型的数据点非常少。我相信第二个数据点的小偏差(例如,由于测量不确定性)对拟合有很大影响,并可能导致收敛问题。
    • 好的。非常感谢你。您的贡献帮助我更好地模拟了我的 IV/口腔数据 :)
    【解决方案2】:

    有了这么多参数,我会使用算法 = "random" 而不是 "brute"。如果我们这样做,那么以下给出的结果接近问题中的结果(由于模型参数的对称性而导致参数的排列):

    set.seed(123)
    fit <- nls2(fo,
            data=df,
            start = grd,
            algorithm = "random",
            control = nls.control(maxiter = 20000))
    nls(fo, df, start = coef(fit), alg = "port", lower = 0)
    

    给予:

    Nonlinear regression model
      model: y ~ f1 * exp(-k1 * t) + f2 * exp(-k2 * t) - (f1 + f2) * exp(-k3 * t)
       data: df
          f1       f2       k1       k2       k3 
    0.005397 0.013923 0.004346 0.063934 0.607893 
     residual sum-of-squares: 2.862e-07
    
    Algorithm "port", convergence message: relative convergence (4)
    

    添加

    上述方法的一个变体是在 minpack.lm 包中使用 nlsLM 而不是 nls,并使用样条曲线来获取数据集中的更多点。代替 nls 行尝试以下操作。它仍然会收敛:

    library(minpack.lm)
    t_s <- with(df, min(t):max(t))
    df_s <- setNames(data.frame(spline(df$t, df$y, xout = t_s)), c("t", "y"))
    nlsLM(fo, df_s, start = coef(fit), lower = rep(0,5), control = nls.control(maxiter = 1024))
    

    在口头示例中也是如此:

    set.seed(123)
    y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
    0.002495,0.00231875);
    t <- c(0,12,24,48,72,96,120,144,168)
    DF <- data.frame(y, t)
    grd <- data.frame(f1=c(0,1), f2=c(0,1), k1=c(0,2), k2=c(0,2), k3=c(0,0.7))
    fit <- nls2(fo,
            data=DF,
            start = grd,
            algorithm = "random",
            control = nls.control(maxiter = 20000))
    
    library(minpack.lm)
    t_s <- with(DF, min(t):max(t))
    df_s <- setNames(data.frame(spline(DF$t, DF$y, xout = t_s)), c("t", "y"))
    nlsLM(fo, df_s, start = coef(fit), lower = rep(0,5), control = nls.control(maxiter = 1024))
    

    【讨论】:

    • 我已根据您的意见对原始问题进行了修改。您建议使用随机种子进行暴力破解似乎在某些数据集上非常有效,尽管在其他数据集上我很难适应。如果您有其他提示,它真的会帮助我
    • 谢谢您,您的贡献对我的数据建模很有帮助:)
    • 您应该勾选您喜欢的那个,而不是使用 cmets 来表示感谢。
    猜你喜欢
    • 1970-01-01
    • 2011-01-15
    • 1970-01-01
    • 2013-12-04
    • 2016-02-02
    • 1970-01-01
    • 2016-09-20
    • 2018-02-10
    • 2018-06-04
    相关资源
    最近更新 更多