【问题标题】:How to use lapply in a data.table without by clause如何在没有 by 子句的 data.table 中使用 lapply
【发布时间】:2020-01-21 23:05:11
【问题描述】:

我正在尝试使用 data.table、lapply 和函数调用来针对同一个变量运行多个回归。我想得到一个简单的表格作为输出,显示每个变量和每个变量的决定系数。

我正在使用 Rstudio 1.2.1335,data.table 1.12.2 我使用的数据集是“http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Appendix%20C%20Data%20Sets/APPENC02.txt

cnames<-c("ID","County","State","Area","Pop","Young","Old","Phys","Beds","Crime","HighSchool","BA","Poverty","Unemploy","PerCapitaIncome","TotalIncome","Region")
df62<-fread("APPENC02.txt", col.names=cnames)
df62[,c("ID", "County","State","Region"):=NULL]
variability<-function(y){
     model<-eval(substitute(lm(Phys~y, data=df62)))
     anova<-anova(model)
     SSR<- anova$`Sum Sq`[1]
     SSE<- anova$`Sum Sq`[2]
     SSTO<-SSR+SSE
     R2<-SSR/SSTO
     return(R2)
}
df62[ , lapply(.SD, variability)]

如果最后一行是:

df62[ , lapply(.SD, Variability), by=Phys]

当我省略 'by' 子句时出现错误消息:“(function(x, i, 精确) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, : object '我'没有找到”

如果我按变量“Phys”分组,我会得到正确的结果,但我会不必要地重复每个结果。

【问题讨论】:

  • 您能否说明使用eval(substitute()) 有什么好处?
  • 所以澄清一下,你想做 13 种不同的回归,其中 Phys 是因变量,而所有其他数值变量都是独立的?
  • 是 - 13 种不同的 Phys 回归是因变量。
  • eval(substitute()) 便于在函数中使用变量名。我的想法来自adv-r.had.co.nz/Computing-on-the-language.html

标签: r data.table lapply


【解决方案1】:

我们可以使用reformulate 创建表达式。在这里,我们可以传递两个参数,“data”和“y”,y 将列名作为参数。

variability<-function(data, y){
     model<- lm(reformulate(y, "Phys"), data=data)
     anova<-anova(model)
     SSR<- anova$`Sum Sq`[1]
     SSE<- anova$`Sum Sq`[2]
     SSTO<-SSR+SSE
     R2<-SSR/SSTO
     return(R2)
}

选择感兴趣的列名

nm1 <- setdiff(names(df62), "Phys")

遍历它们,应用函数,而data.SD

setnames(df62[, lapply(nm1, variability, data = .SD)], nm1)[]
#    Area       Pop      Young          Old      Beds     Crime   HighSchool         BA     Poverty    Unemploy PerCapitaIncome TotalIncome
#1: 0.006095652 0.8840674 0.01432791 9.788323e-06 0.9033826 0.6731538 1.804622e-05 0.05605789 0.004113459 0.002551878       0.0999411   0.8989137

数据

cnames<-c("ID","County","State","Area","Pop","Young","Old","Phys","Beds","Crime","HighSchool","BA","Poverty","Unemploy","PerCapitaIncome","TotalIncome","Region")

df62 <- fread("http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Appendix%20C%20Data%20Sets/APPENC02.txt", col.names = cnames)
df62[,c("ID", "County","State","Region"):=NULL]

【讨论】:

    猜你喜欢
    • 2015-10-14
    • 1970-01-01
    • 2017-11-17
    • 1970-01-01
    • 2020-03-12
    • 2015-11-23
    • 1970-01-01
    • 2012-11-23
    • 1970-01-01
    相关资源
    最近更新 更多