【问题标题】:R foreach error when using formula notation in randomForest在 randomForest 中使用公式表示法时出现 R foreach 错误
【发布时间】:2013-09-26 06:43:17
【问题描述】:

我在使用 fore each 并行运行 randomForest 时遇到问题。 看这个例子,我创建了一些数据,然后是一个公式符号。 该公式本身适用于随机森林。 但是在 foreach 并行循环中使用时失败...?

# rf on big training set
# use parallel foreach
library(foreach)

library(doMC)
registerDoMC(4)  #change the 2 to your number of CPU cores 
# info on parrallell backend
getDoParName()
getDoParWorkers()

# bogus data
set.seed(123)
ssize <- 100000
x1 <- sample( LETTERS[1:9], ssize, replace=TRUE, prob=c(0.1, 0.2, 0.15, 0.05,0.1, 0.2, 0.05, 0.05,0.1) )
x2 <- rlnorm(ssize,0,0.25)
x3 <- rlnorm(ssize,0,0.5)
y <- sample( c("Y","N"), ssize, replace=TRUE, prob=c(0.05, 0.95))
df <- data.frame(x1,x2,x3,y)
df$p_y <- as.numeric(df$y)-1

# use strata to sample whole dataset
library(sampling)

s1 = strata(df,stratanames = "y", size = c(2500,2500))
s2 = strata(df,stratanames = "y", size = c(2500,2500))
s3 = strata(df,stratanames = "y", size = c(2500,2500))
s4 = strata(df,stratanames = "y", size = c(2500,2500))

s_list <- list(s1$ID_unit, s2$ID_unit, s3$ID_unit, s4$ID_unit)

# model function
rf.formula <- as.formula(paste("y","~",paste("x1","x2",sep="+")))

library(randomForest)

# simple stuff works but takes some time
model.rf <-randomForest(y ~ x1 + x2, df, ntree=100, nodesize = 50)

# build rf with dopar on explicit formula works and is quick
model.rf.dopar <- foreach(subset=s_list, .combine=combine, .packages='randomForest') %dopar%
  randomForest(y ~ x1 + x2, df, ntree=100, nodesize = 50, subset=subset)

# build rf with dopar on rf.formula fails
model.rf.s.b2 <- foreach(subset=s_list, .combine=combine, .packages='randomForest') %dopar%
  randomForest(rf.formula, df, ntree=100, nodesize = 50, subset=subset)

# > model.rf.s.b2 <- foreach(subset=s_list, .combine=combine, .packages='randomForest') %dopar%
#   +   randomForest(rf.formula, df, ntree=100, nodesize = 50, subset=subset)
# Error in randomForest(rf.formula, df, ntree = 100, nodesize = 50, subset = subset) : 
#   task 1 failed - "invalid subscript type 'closure'"

错误:

model.rf.s.b2 <- foreach(subset=s_list, .combine=combine, .packages='randomForest') %dopar%
   +   randomForest(rf.formula, df, ntree=100, nodesize = 50, subset=subset)

Error in randomForest(rf.formula, df, ntree = 100, nodesize = 50, subset = subset) : 
task 1 failed - "invalid subscript type 'closure'"

有什么建议吗?

发送

【问题讨论】:

  • 您的子集有问题。尝试在foreach 循环中添加print(subset) 或类似的内容,看看它是否是您期望的格式。

标签: r foreach random-forest


【解决方案1】:

问题似乎是由于model.frame.default 函数中的索引操作出错,该函数由randomForest.formula 间接调用。我完全不确定是什么引发了问题,因为model.frame.default 中发生了很多棘手的 eval,但修改公式的环境似乎可以解决问题:

r <- foreach(subset=s_list, .combine='combine', .multicombine=TRUE,
             .packages='randomForest') %dopar% {
  environment(rf.formula) <- environment()
  randomForest(rf.formula, df, ntree=100, nodesize = 50, subset=subset)
}

特别是,这会导致subset 被正确评估,否则它会评估为subset 函数。我尝试重命名迭代变量,但没有帮助。

请注意,我还将 .multicombine 设置为 TRUE,因为 randomForest combine 函数接受多个对象,这可以显着提高性能。

更新

问题可以通过以下方式重现:

fun <- function(subset) {
  randomForest(rf.formula, df, ntree=100, nodesize = 50, subset=subset)
}
fun(s_list[[1]])

例如,如果将变量subset 更改为s,它也会失败,但会显示较少误导性的错误消息:

> fun <- function(s) {
>   randomForest(rf.formula, df, ntree=100, nodesize = 50, subset=s)
> }
> fun(s_list[[1]])
Error in eval(expr, envir, enclos) : object 's' not found
Calls: fun ... eval -> model.frame -> model.frame.default -> eval -> eval
Execution halted

foreach 示例一样,重置公式的环境似乎可以解决问题。

【讨论】:

  • 很好的回复,谢谢。虽然这是一种解决方法,但它对我有用。
猜你喜欢
  • 2020-09-17
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-05-31
  • 1970-01-01
  • 2013-11-27
  • 1970-01-01
相关资源
最近更新 更多