在 R 问题中并行运行答案

【问题标题】：Running parallel in R problems在 R 问题中并行运行
【发布时间】：2015-06-01 17:19:24
【问题描述】：

我尝试使用并行包在 R 中测试并行。但在我的示例中（如下面的代码），并行任务的时间比单个任务的时间长。有人可以给我一些建议吗？

非常感谢！

##parSquareNum.R
strt <- Sys.time()
workerFunc <- function(n) { return(n^2) }
values <- 1:1000000
library(parallel)
## Number of workers (R processes) to use:
cores <- detectCores()
## Set up the ’cluster’
cl <- makeCluster(cores-1)
## Parallel calculation (parLapply):
res <- parLapply(cl, values, workerFunc)
## Shut down cluster
write(Sys.time()-strt, 'parallel.txt')
stopCluster(cl)

##singleSquareNum.R

## The worker function to do the calculation:
strt <- Sys.time()
workerFunc <- function(n) { return(n^2) }
## The values to apply the calculation to:
values <- 1:1000000
## Serial calculation:
res <- lapply(values, workerFunc)
##print(unlist(res))
write(Sys.time() -strt, 'single.txt')

【问题讨论】：

并行性不太可能对核心的简单任务有益，因为调度和重组过程也需要时间。需要将复杂的任务分拆出来以展示收益。

标签： r parallel-processing

【解决方案1】：

您看到此问题的主要原因是加载库和制作集群需要一些时间。将strt <- Sys.time() 移动到res 之前的右侧，您会看到不同之处，尤其是如果您增加values 的值

##parSquareNum.R
workerFunc <- function(n) { return(n^2) }
values <- 1:1000000
library(parallel)
## Number of workers (R processes) to use:
cores <- detectCores()
## Set up the ’cluster’
cl <- makeCluster(cores-1)
## Parallel calculation (parLapply):
strt <- Sys.time()
res <- parLapply(cl, values, workerFunc)
write(Sys.time()-strt, 'parallel.txt')
## Shut down cluster
stopCluster(cl)

##singleSquareNum.R

## The worker function to do the calculation:
workerFunc <- function(n) { return(n^2) }
## The values to apply the calculation to:
values <- 1:1000000
## Serial calculation:
strt <- Sys.time()
res <- lapply(values, workerFunc)
##print(unlist(res))
write(Sys.time() -strt, 'single.txt')

当我运行时，并行运行时间为 0.6941409 秒，单次运行时间为 1.117002 秒。 1.6 倍加速。我在 i7 芯片上运行。

【讨论】：

非常感谢您的回答！