在分组数据中拟合条件答案

【问题标题】：fitting with conditional in grouped data在分组数据中拟合条件
【发布时间】：2017-11-17 02:32:47
【问题描述】：

对不起，问题太长了，但我会尽量弄清楚这个问题。

我正在尝试对数据中的不同组进行拟合，并尝试为每个组获取拟合系数。

我环顾四周，但没有完全相同的问题，但发现了一些类似的帖子，如下所示，

Conditional nls

Trying to fit data with R and nls on a function with a condition in it

但似乎拟合似乎并不关心条件设置，所以我得到不同组的相同拟合系数。（这也是我的真实数据的相同情况。）

基本上，如果gr==a 适合该组，则尝试使用不同的拟合系数集，否则适合gr==b。

我正在使用来自minpack.lm 包的nlsLM，因为我还需要设置拟合系数的起始值。

这是我尝试过的代码：

library(minpack.lm)

set.seed(95)

df <- data.frame(gr=rep(seq(1,2),each=10),sub_gr=rep(rep(c("a","b"),each=5),2),
              y = rep(c(sort(runif(5,0,0.5),decreasing=TRUE), sort(runif(5,0,0.5),,decreasing=TRUE)),2),
              x = rep(c(seq(0.1,0.5,0.1)),4))

#创建空列表以填充拟合系数基于@Hack-R 解决方案 Error: Results are not data frames at positions:

empty_dat <- structure(list(x = numeric(0), y = numeric(0), gr = integer(0), sub_gr = character(0), 
              pred_fit = numeric(0), k_a = numeric(0), k_b = numeric(0),
              t_a = numeric(0), t_b= numeric(0)), class = "data.frame")

#do the fitting in groups


for(x in unique(df$gr)){

  #trycatch to   

  fit <- tryCatch(nlsLM(y~ifelse(sub_gr=='a', k_a*x+t_a, k_b*x+t_b),
                        data=df[df$gr==x,],start=c(k_a=0.3,k_b=0.4,t_a=0.1,t_b=0.2),

                        lower = c(0.05, 0.05, 0,0),
                        upper = c(1,1,1,1),                 
                        trace=T,na.action=na.omit, control = nls.lm.control(maxiter=100)),error=function(e) NULL)

  if(!("NULL" %in% class(fit))){
    pred_fit <- predict(fit, newdata =df$x)

    coefs_fit <- data.frame(k_a=coef(fit)[1],k_b=coef(fit)[2],t_a=coef(fit)[3], t_b=coef(fit)[4])



#filling empty_data with coefs and df's original values
        empty_dat <- rbind(empty_dat,data.frame(df[df$gr==x,],coefs_fit,pred_fit,row.names=NULL))
              }   
            }

空数据

  gr sub_gr          y   x  k_a  k_b       t_a       t_b  pred_fit
1   1      a 0.28792044 0.1 0.05 0.05 0.1343742 0.2156747 0.1393742
2   1      a 0.24443957 0.2 0.05 0.05 0.1343742 0.2156747 0.1443742
3   1      a 0.07585577 0.3 0.05 0.05 0.1343742 0.2156747 0.1493742
4   1      a 0.03522243 0.4 0.05 0.05 0.1343742 0.2156747 0.1543742
5   1      a 0.02654922 0.5 0.05 0.05 0.1343742 0.2156747 0.1593742
6   1      b 0.48498563 0.1 0.05 0.05 0.1343742 0.2156747 0.2206747
7   1      b 0.18702842 0.2 0.05 0.05 0.1343742 0.2156747 0.2256747
8   1      b 0.15186749 0.3 0.05 0.05 0.1343742 0.2156747 0.2306747
9   1      b 0.15003048 0.4 0.05 0.05 0.1343742 0.2156747 0.2356747
10  1      b 0.07638354 0.5 0.05 0.05 0.1343742 0.2156747 0.2406747
11  2      a 0.28792044 0.1 0.05 0.05 0.1343742 0.2156747 0.1393742
12  2      a 0.24443957 0.2 0.05 0.05 0.1343742 0.2156747 0.1443742
13  2      a 0.07585577 0.3 0.05 0.05 0.1343742 0.2156747 0.1493742
14  2      a 0.03522243 0.4 0.05 0.05 0.1343742 0.2156747 0.1543742
15  2      a 0.02654922 0.5 0.05 0.05 0.1343742 0.2156747 0.1593742
16  2      b 0.48498563 0.1 0.05 0.05 0.1343742 0.2156747 0.2206747
17  2      b 0.18702842 0.2 0.05 0.05 0.1343742 0.2156747 0.2256747
18  2      b 0.15186749 0.3 0.05 0.05 0.1343742 0.2156747 0.2306747
19  2      b 0.15003048 0.4 0.05 0.05 0.1343742 0.2156747 0.2356747
20  2      b 0.07638354 0.5 0.05 0.05 0.1343742 0.2156747 0.2406747

我们可以清楚地看到系数 k_a、k_b 和 t_a 、t_b 对于每个 gr 和 sub_gr 都是相同的。

如果我想绘制拟合的结果和预测值

合适的台词讲述不同的故事:))

library(ggplot2)

ggplot(df, aes(x=x, y=y,col=sub_gr,shape=sub_gr)) + 
  geom_point(size=6,alpha=0.8,stroke=1.4)  +
  theme_bw()+
  facet_wrap(~gr,scales='free')+
  scale_color_manual(values=c("blue","red"))+
  geom_line(data=empty_dat,aes(x=x,y=pred_fit,group=sub_gr,col=sub_gr))

【问题讨论】：

标签： r data-fitting nls

【解决方案1】：

这是一个可能的解决方案。但是，根据您设置示例数据的方式，我遇到了您提到的相同问题，即每个组的模型都相同。也就是说，两组的训练数据完全相同，所以得到相同的系数我并不感到惊讶。

library(tidyverse)
library(broom)

# a function to build the model 
makefit <- function(df) {
    tryCatch(nlsLM(y~ifelse(sub_gr=='a', k_a*x+t_a, k_b/x+t_b),
                        data=df,start=c(k_a=0.3,k_b=0.4,t_a=0.1,t_b=0.2),

                        lower = c(0.05, 0.05, 0,0),
                        upper = c(1,1,1,1),                 
                        trace=T,na.action=na.omit, control = nls.lm.control(maxiter=100)),error=function(e) NULL)
}

# a function to get the coefficients out of the model
myaugment <- function(fit) {
    data.frame(k_a=coef(fit)[1],k_b=coef(fit)[2],t_a=coef(fit)[3], t_b=coef(fit)[4])
}


dfprep <- df %>% 
  group_by(gr) %>% 
  # nest the other variables as a list col
  nest() %>% 
  mutate(
       # build a model for each group
       model = map(data, makefit)
       # get the coeffients
     , modelaugment = map(model, myaugment)
  )


 # extract the results
results <- dfprep %>% 
  # extract the original data
  unnest(data) %>% 
  # join in the coefficent data
  left_join(dfprep %>% unnest(modelaugment), by = 'gr')

另外，我不确定您从哪里获得新的测试数据集，因为它没有包含在您的示例中，所以我没有提供获取该数据集的方法。但是，将其构建到 makeaugment() 函数中应该非常简单。

【讨论】：

非常感谢您的回答。OTH，results 仍然为不同的sub_gr 提供相同的系数。但是你的代码更短更干净。另外，无法理解您关于new testing data 的陈述？只有一个数据集，它是@987654326@，之后我创建empty_dat 来填充合适的输出。
如果您的原始示例，您预测来自new.range 的新数据，我认为这是另一个像 df 这样的对象，因为它没有在其他地方引用。关于模型：我不清楚您模型中的公式是什么意思（对非线性模型不是很熟悉，我不太明白k_a 的来源），但我想知道您是否需要在公式中使用I()——因为你要除下后半部分，你打算乘以还是指定交互？请参阅formula() 帮助的第 2 - 3 段。
哦，我明白了。对于那个很抱歉。复制过去时我忘了修改那部分。我修改了OP。我还编辑了方程式部分。等式也应该是一样的。
我添加了 pred_fit 部分以表明，由于我们无法正确分组，拟合线与真实数据的失真很大。
要明确一点，当你在公式中写你的变量时，你的意思是 k_a 乘以 x 加上 t_a 还是你的意思是 k_a 和 x 的交互以及 t_a？