当我尝试并行拟合多个模型时，为什么 tensorflow/keras 会阻塞？答案

【问题标题】：Why does tensorflow/keras choke when I try to fit multiple models in parallel?当我尝试并行拟合多个模型时，为什么 tensorflow/keras 会阻塞？
【发布时间】：2018-09-07 19:18:10
【问题描述】：

我正在尝试拟合一个有限混合模型，每个类别的混合模型都是神经网络。能够并行化对我来说非常有用，因为 keras 不会最大化我笔记本电脑上的所有可用内核，更不用说大型集群了。

但是当我尝试为不同的模型设置不同的学习率时，在一个并行的 foreach 循环中，整个事情就窒息了。

发生了什么事？我怀疑它与范围有关——也许工作人员没有在 tensorflow 的单独实例上运行。但我真的不知道。我怎样才能使这项工作？我需要了解什么才能知道为什么这不起作用？

这是一个 MWE。将foreach 循环设置为%do%，它工作正常。将其设置为%dopar%，它就会在拟合阶段窒息。

library(foreach)
library(doParallel)
registerDoParallel(2)
library(keras)
library(tensorflow)
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255

y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

# make tensorflow run single-threaded
session_conf <- tf$ConfigProto(intra_op_parallelism_threads = 1L,
                               inter_op_parallelism_threads = 1L)
# Create the session using the custom configuration
sess <- tf$Session(config = session_conf)
K <- backend()
K$set_session(sess)


models <- foreach(i = 1:2) %dopar%{
  model <- keras_model_sequential() 
  model %>% 
    layer_dense(units = 256/i, activation = 'relu', input_shape = c(784)) %>% 
    layer_dropout(rate = 0.4) %>% 
    layer_dense(units = 128/i, activation = 'relu') %>%
    layer_dropout(rate = 0.3) %>%
    layer_dense(units = 10, activation = 'softmax')

  print("A")
  model %>% compile(
    loss = 'categorical_crossentropy',
    optimizer = optimizer_rmsprop(),
    metrics = c('accuracy')
  )
  print("B")
  history <- model %>% fit(
    x_train, y_train, 
    epochs = 3, batch_size = 128, 
    validation_split = 0.2, verbose = 0
  )
  print("done")  
}

这里是sessionInfo()：

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] panelNNET_1.0       matrixStats_0.54.0  MASS_7.3-50         lfe_2.8-2           tensorflow_1.9      keras_2.1.6.9005   
 [7] mgcv_1.8-24         nlme_3.1-137        scales_1.0.0        forcats_0.3.0       stringr_1.3.1       purrr_0.2.5        
[13] readr_1.1.1         tidyr_0.8.1         tibble_1.4.2        tidyverse_1.2.1     maptools_0.9-3      rgeos_0.3-28       
[19] rgdal_1.3-4         sp_1.3-1            broom_0.5.0         ggplot2_3.0.0       randomForest_4.6-14 dplyr_0.7.6        
[25] glmnet_2.0-16       Matrix_1.2-14       doBy_4.6-2          doParallel_1.0.11   iterators_1.0.10    foreach_1.4.4      

loaded via a namespace (and not attached):
 [1] httr_1.3.1          jsonlite_1.5        modelr_0.1.2        Formula_1.2-3       assertthat_0.2.0    cellranger_1.1.0   
 [7] yaml_2.2.0          pillar_1.3.0        backports_1.1.2     lattice_0.20-35     glue_1.3.0          reticulate_1.10    
[13] digest_0.6.15       RcppEigen_0.3.3.4.0 rvest_0.3.2         colorspace_1.3-2    sandwich_2.5-0      plyr_1.8.4         
[19] pkgconfig_2.0.1     haven_1.1.2         xtable_1.8-2        whisker_0.3-2       withr_2.1.2         lazyeval_0.2.1     
[25] cli_1.0.0           magrittr_1.5        crayon_1.3.4        readxl_1.1.0        xml2_1.2.0          foreign_0.8-70     
[31] tools_3.5.1         hms_0.4.2           munsell_0.5.0       bindrcpp_0.2.2      compiler_3.5.1      rlang_0.2.2        
[37] grid_3.5.1          rstudioapi_0.7      base64enc_0.1-3     labeling_0.3        gtable_0.2.0        codetools_0.2-15   
[43] R6_2.2.2            tfruns_1.3          zoo_1.8-3           lubridate_1.7.4     zeallot_0.1.0       bindr_0.1.1        
[49] stringi_1.2.4       Rcpp_0.12.18        tidyselect_0.2.4

【问题讨论】：

仅供参考，指定操作系统至关重要，最好是完整的sessionInfo()。具体来说，doParallel::registerDoParallel(2) 为不同的操作系统生成不同类型的集群。
是的，这是 Linux，谢谢。发布会话信息
不是 R 专家。也许您应该为每个并行运行创建一个 tf$Session？否则运行之间可能会发生冲突
@DanielGL 好主意，值得一试。但是你能在 python 上并行运行单线程 tensorflow 会话吗？
@DanielGL 你的想法很有效，回想起来应该很明显。如果您想要一些分数，请写一个简短的答案。

标签： r tensorflow parallel-processing scope keras

【解决方案1】：

Keras 要求在给定会话中只有一次培训。我会尝试为每个模型创建不同的会话。

我会在 %dopar% 中插入这部分代码，以便为每个模型创建不同的会话

sess <- tf$Session(config = session_conf)
K <- backend()
K$set_session(sess)

【讨论】：

【解决方案2】：

接受的答案是正确的，因为需要一个新的会话才能使其工作。

但是，我发现在 foreach 循环中这样做会导致速度大幅下降，这可能是由于某种内存泄漏。

我找到的解决方法是编写一个脚本，比如 fit_model.R，它创建 tensorflow 会话、加载存储的权重、拟合模型等。然后我创建另一个脚本，比如 meta_fit.R。该脚本包含foreach 循环，但每个worker 只执行system("Rscript fit_model.csv")。这样，在脚本退出后，操作系统会在每次运行 keras/tensorflow 后清理所有剩余的东西。

【讨论】：