【发布时间】:2016-03-01 13:59:53
【问题描述】:
我从 SQL 数据库中导入了一个与此示例表结构相似的大表
testData <- data.frame(
BatchNo = c(1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3),
Y = c(1,1.247011378,1.340630851,1.319026357,1.41264583,1.093619473,1.38023909,1.473858563,1,1.093619473,1.038888089,1.081833061,1,1.215913383,1.278861891,1.297746443,1.360694952,1.332368123,1.414201183,1,1.081833061,1,1.063661202),
Categorical1 = c("A9","B5513","B5513","B5514","B5514","A9","B5514","B5514","A9","A9","B1723","A9","A9","B5513","B5514","B5513","B5514","B5514","B5514","A9","A9","A486","B1701"),
Categorical2 = c("A2793","B5512","B5512","B5512","B5512","B5508","B6623","B6623","B5508","B5508","B5508","A127","A127","B5515","B5515","B5515","B5515","B6623","B6623","A127","A127","A2727","A2727"),
Categorical3 = c("A5510","B5511","B5511","B5511","B5511","A5510","B5511","B5511","B5511","B5511","B5511","A5518","A5518","B5517","B5517","B5517","B5517","B5517","B5517","B5517","B5517","A2","A2"),
Categorical4 = c("A5","A5","B649","A5","B649","B649","A5","B649","A5","B649","A5","B649","A5","A5","A5","B649","B649","A5","B649","A5","B649","A649","A649"),
Binary1 = c(rep(0,times=23)),
Binary2 = c(rep(0,times=23)),
Binary3 = c(rep(0,times=23)),
Binary4 = c(rep(0,times=23))
)
我想在 for 循环中做的是:
1.根据 BatchNo 列(1 到 2500)创建子集数据框
2.使用每个子集数据框拟合线性模型
3.将系数估计列表导出回SQL表
到目前为止,步骤 1 和 2 的内容如下:
n<-max(testData[,1])
for (i in 1:n) {
assign(paste("dat"),droplevels(subset(testData,BatchNo == i, select = 1:10)))
assign(paste("lm.", i, sep =
""),lm(Y~Categorical1+Categorical2+Categorical3+Categorical4+Binary1+Binary2+Binary3+Binary4,data=dat))}
问题在于,将创建子集,其中 4 个分类变量中的至少一个(或可能全部)将具有单个级别(如本例中的 BatchNo = 3),并且 R 不能在回归中使用这些变量。
对于二元预测器来说这不是问题,因为它只会产生N/A 系数估计值,我会在模型拟合后执行step(backward) 来删除其中的任何一个。
起初我尝试使用step(forward) 在每个循环中只选择有意义的预测变量,但这不起作用,因为我必须列出所有潜在的预测变量以供选择。
我能想到两种可能的解决方案:
- 在每个循环中从 "dat" 中删除单级因子列
- 或者为每个循环创建一个多级因子名称的向量/列表,并以某种方式在
lm公式中使用它
我只需要创建这两个向量:
factors<-dat[,3:6]
f<-names(factors)
levels<-c(length(levels(factors[,1])),length(levels(factors[,2])),length(levels(factors[,3])),length(levels(factors[,4])))
所以现在我只需要从 "f" 中删除第 n 个元素,其中 "levels" 的第 n 个元素等于 1。
【问题讨论】:
-
请不要把数据当成图片...How to make a great R reproducible example?
-
第1步-根据BatchNo使用
split(),结果是一个列表,将此列表推入lapply(),第2步-在lapply()内使用droplevels()和lm()。 -
嗨@zx8745,感谢您的评论。但是,这不是为了替换 for 循环 而不是我的问题吗?
-
在下面分享您的解决方案作为答案,这将使您的帖子更清晰,也许我们可以提供改进。