【发布时间】:2020-06-24 11:17:46
【问题描述】:
我正在尝试并行化管道。 在管道中有一个 tidyr 命令(“tidyr::complete”)。一旦并行运行,这会破坏代码,因为无法识别对象类。
在 dplyr 中是否有替代方法来完成?
library(dplyr)
library(tidyr)
library(zoo)
test <- tibble(year=c(1,2,3,4,5,5,1,4,5),
var_1=c(1,1,1,1,1,1,2,2,2),
var_2=c(1,1,1,1,1,2,3,3,3),
var_3=c(0,5,NA,15,20,NA,1,NA,NA))
max_year <- max(test$year,na.rm = T)
min_year <- min(test$year,na.rm = T)
连续剧
test_serial <- test %>%
group_by(var_1,var_2) %>%
complete(var_1, year = seq(min_year,max_year)) %>%
mutate(
var_3 = na.approx(var_3,na.rm = FALSE),
var_3 = if(all(is.na(var_3))) NA else na.spline(var_3,na.rm = FALSE))
并行(失败)
devtools::install_github("hadley/multidplyr")
library(multidplyr)
cl <- new_cluster(2)
cluster_copy(cl, c("test","max_year","min_year"))
cluster_library(cl, c("dplyr","tidyr","zoo"))
test_parallel <- test %>% group_by(var_1,var_2) %>% partition(cl)
test_parallel <- test_parallel %>%
dplyr::group_by(var_1,var_2) %>%
tidyr::complete(var_1, year = seq(min_year,max_year)) %>%
dplyr::mutate(
var_3 = na.approx(var_3,na.rm = FALSE),
var_3 = if(all(is.na(var_3))) NA else na.spline(var_3,na.rm = FALSE)) %>%
collect()
这是错误信息
Error in UseMethod("complete_") :
no applicable method for 'complete_' applied to an object of class "multidplyr_party_df"
【问题讨论】:
标签: r dplyr parallel-processing multidplyr