如何找到非线性回归模型的起始值？答案

【问题标题】：How can I find non-linear regression model starting values?如何找到非线性回归模型的起始值？
【发布时间】：2019-04-26 11:51:55
【问题描述】：

我正在尝试将非线性树直径高度模型（Max & Burkhart，1976）拟合到我的数据集（由 D，胸高直径 (cm)；H，总树高 (m)；hi距离地面的截面高度，高水平的di直径等）在R。

我在拟合模型时遇到问题。我认为这是因为方程的起始参数值。我收到“NaNs 产生”错误。我试图调整起始参数。错误数量减少到 1 但不是零。所以我需要找到一种方法来估计非线性回归模型的起始参数。我搜索了自启动模型，但由于方程的复杂性和我缺乏知识，无法应用于我的方程。我将在这里添加我所有的数据集，以便你们可以告诉我一个方法。

顺便说一句，我不确定是否可以将文件附加到我的问题中，因此我会为任何想要查看或下载的人提供我的数据集的链接。我将我的数据上传到谷歌驱动器，链接是 https://drive.google.com/file/d/1q7W1bUcx4sK2G2QPte7ZtCudSLfBxpet/view?usp=sharing

# Function to compute Max & Burkhart (1976) equation
ComputeDi.MaxBurkhart <- function(hi, d, h, b1, b2, b3, b4, a1, a2){
    x <- hi / h
    x1 <- x - 1 
    x2 <- x ^ 2 - 1
    di <- d * sqrt(b1 * x1 + b2 * x2 + b3 * (a1 - x) ^ 2 * ((a1 - x) >= 0.0) + b4 * (a2 - x) ^ 2 * ((a2 - x) >= 0.0))
    return(di)
}

# Set the working directory
setwd("../Data")

# Load data and rename some variables
sylvestris <- read.csv("mydata.csv")

# Global fitting
nlmod.fp.di <- nls(di ~ ComputeDi.MaxBurkhart(hi, d, h, b1, b2, b3, b4, a1, a2), data = sylvestris, start = c(b1 = -2.53, b2 = 1.2, b3 = -1.5, b4 = 22, a1 = 0.72, a2 = 0.15

), control = nls.control(tol = 1e-07))
summary(nlmod.fp.di, correlation = T)

到这里为止一切都好。在这里之后我遇到了 Nan 错误！

# Set seed and select names of trees
trees <- unique(sylvestris$tree) 
set.seed(15)
result.list <- list()
i <- 1
while(length(trees) > 0){
    tree.smp <- sample(trees, 10, replace = F)
    sylvestris.smp <- sylvestris[sylvestris$tree %in% tree.smp, ]
    fitting.ols <- try(nls(di ~ ComputeDi.MaxBurkhart(hi, d, h, b1, b2, b3, b4, a1, a2), data = sylvestris.smp, start = c(b1 = -2.53, b2 = 1.2, b3 = -1.5, b4 = 22, a1 = 0.72, a2 = 0.15

), control = nls.control(tol = 1e-07)), silent = T)
    if(class(fitting.ols)[1] == "try-error"){
            fit.smp <- data.frame(trees = paste(tree.smp, collapse = "_"), 
t(rep(NA, 8)))
            names(fit.smp) <- c("trees", "b1", "b2", "b3", "b4", "a1", 
"a2", "NS", "RSE")
    } else {
            nlmod.ols <- fitting.ols
            fit.smp <- data.frame(trees = paste(tree.smp, collapse = "_"), t(coef(fitting.ols)), NS = sum(summary(fitting.ols)$parameters[, 4] > 0.05), RSE = summary(fitting.ols)$sigma)
    }
    result.list[[i]] <- fit.smp
    i <- i + 1
    trees <- trees[!trees %in% tree.smp]        
}

我期望重要的参数估计没有任何 NaN 错误。我确定问题出在起始值上，因为此代码块与另一个数据集完美配合。当我更改数据时，我得到了这个错误。提前谢谢你。

【问题讨论】：

您可以通过dput 提供示例数据，因为驱动器的链接可能无法被所有人访问和/或将来可能会过期。
nls2 包提供了蛮力方法和其他可用于查找起始值的方法。

标签： r nls non-linear-regression

【解决方案1】：

您可以尝试使用包nls.multstart，它是为了简化初始值的估计。

您基本上可以指定起始参数的范围，将使用最佳参数进行回归，基于AIC分数。

【讨论】：

实际上我在看到您的评论后尝试使用 nls.mutistart 包。但是，由于缺乏 R 知识，我无法使其工作。你能告诉我如何使用它吗？对于我的基本问题，我真的很抱歉。但我是 R 的新手，需要一些帮助。提前致谢。
对不起，我自己还没用过。据我所知，函数 ǹls_multstart` 的工作方式与基本 nls 函数一样。您可以通过分别使用start_lower 和start_upper 指定下限和上限来指定范围广泛的值，而不是使用参数start 指定起始值。抱歉，我无法为您提供更多帮助。