【问题标题】:Number of rows in predicted data frame does not equal number of rows in new data frame fed to predict function预测数据帧中的行数不等于馈送到预测函数的新数据帧中的行数
【发布时间】:2020-07-22 08:15:34
【问题描述】:

我有一个基于因子向量拆分的数据框。我正在尝试为每个数据集创建一个模型,然后从中创建一组预测值。

我试图将预测值跨越大量值(例如length.out = 500),但是当我向predict 函数提供一个包含 500 行的新数据集时,它仍然会输出预测数据与输入模型的原始数据帧长度相同的帧。

data(mtcars)
rownames(mtcars) <- NULL #I've ran this code with and without this line, both times it gave the same result

mtcars.split <- split(mtcars, mtcars$cyl)

mtcars.split <- lapply(mtcars.split, function(x){
  rownames(x) <- NULL
  x <- droplevels(x)
  return(x)
})

mtcars.lm <- lapply(mtcars.split, function(x){
  lm(disp ~ wt, data = x)
})

mtcars.fitted <- mapply(x = mtcars.lm, y = mtcars.split, function(x, y){
  newdata = data.frame(wt = seq(min(y$wt), max(y$wt), length.out = 500))
  fitted <- as.data.frame(predict(x, new.data = newdata, se = T))
  return(fitted)
}, SIMPLIFY = F)

lapply(mtcars.fitted, nrow)
lapply(mtcars.split, nrow)

我尝试为整个数据集运行线性模型,它做了同样的事情。

mtcars.lm.all <- lm(disp ~ wt, data = mtcars)
newdata <- data.frame(wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 500))
nrow(as.data.frame(predict(mtcars.lm.all, new.data = newdata, se = T)))

即使尝试对数据集进行子集化也没有任何区别。

mtcars.head <- head(mtcars, n = 16)
mtcars.head.lm <- lm(disp ~ wt, data = mtcars.head)
predict.mtcars <- as.data.frame(predict(mtcars.head.lm, 
                                        new.data = data.frame(wt = seq(min(mtcars.head$wt), 
                                                                       max(mtcars.head$wt), 
                                                                       length.out = 500)),
                                        se = T))
nrow(predict.mtcars)

我在这里遗漏了什么吗?这曾经有效,但现在似乎不起作用。即使重新启动 R 会话或我的计算机似乎也无法正常工作。

【问题讨论】:

    标签: r predict


    【解决方案1】:

    predict 函数中的参数不是new.data 而是newdata

    附上想要的结果。

    data(mtcars)
    rownames(mtcars) <- NULL #I've ran this code with and without this line, both times it gave the same result
    
    mtcars.split <- split(mtcars, mtcars$cyl)
    
    mtcars.split <- lapply(mtcars.split, function(x){
      rownames(x) <- NULL
      x <- droplevels(x)
      return(x)
    })
    
    mtcars.lm <- lapply(mtcars.split, function(x){
      lm(disp ~ wt, data = x)
    })
    
    mtcars.fitted <- mapply(x = mtcars.lm, y = mtcars.split, function(x, y){
      newdata = data.frame(wt = seq(min(y$wt), max(y$wt), length.out = 500))
      fitted <- as.data.frame(predict(x, newdata = newdata, se = T))
      return(fitted)
    }, SIMPLIFY = F)
    
    lapply(mtcars.fitted, nrow)
    #> $`4`
    #> [1] 500
    #> 
    #> $`6`
    #> [1] 500
    #> 
    #> $`8`
    #> [1] 500
    lapply(mtcars.split, nrow)
    #> $`4`
    #> [1] 11
    #> 
    #> $`6`
    #> [1] 7
    #> 
    #> $`8`
    #> [1] 14
    
    
    mtcars.lm.all <- lm(disp ~ wt, data = mtcars)
    newdata <- data.frame(wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 500))
    nrow(as.data.frame(predict(mtcars.lm.all, newdata = newdata, se = T)))
    #> [1] 500
    

    reprex package (v0.3.0) 于 2020 年 7 月 22 日创建

    【讨论】:

      猜你喜欢
      • 2019-09-21
      • 2023-03-03
      • 1970-01-01
      • 2014-12-06
      • 1970-01-01
      • 2018-04-17
      • 2023-03-21
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多