如何在这个 Python 脚本中实现多处理？答案

【问题标题】：How to implement multiprocessing in this Python script?如何在这个 Python 脚本中实现多处理？
【发布时间】：2018-01-30 19:02:21
【问题描述】：

我在笔记本电脑上使用 Jupyter 运行这个 Python 3.5 脚本，但循环非常慢，所以我开始阅读有关如何加快代码速度的文章，我发现我可以导入一个多处理库来执行此操作，但是我不知道如何在脚本中实现这一点。

# Larger LSTM Network to Generate Text for Alice in Wonderland
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
# load ascii text and covert to lowercase
filename = "wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
# summarize the loaded data
n_chars = len(raw_text)
n_vocab = len(chars)
print ("Total Characters: ", n_chars)
print ("Total Vocab: ", n_vocab)
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print ("Total Patterns: ", n_patterns)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(X, y, epochs=50, batch_size=64, callbacks=callbacks_list)

脚本来自this tutorial。

【问题讨论】：

你说的是哪个循环？
@NassimBen 最后一个，运行 50 个 epochs.. 非常慢。一个 epoch 大约需要 10 分钟。
您是在 GPU 还是 CPU 上进行训练？数字的输入形状是什么？以及样本数量？
是 CPU 的，我不能问 las 2 的问题。对不起，我是 phyton 的新手。

标签： python tensorflow multiprocessing keras rnn

【解决方案1】：

多处理库通常对 python 脚本有帮助，但在这种情况下，它的帮助并不大，因为大部分逻辑都隐藏在 keras 及其后端的实现中。一个 epoch 10 分钟对于神经网络来说实际上听起来是合理的（这些东西运行起来成本很高！），尤其是如果你在没有 GPU 的情况下运行它。

如果您使用 Tensorflow 作为 Keras 的后端，则在执行 model.fit() 时应自动使用所有 CPU。您可以在执行代码时通过查看您最喜欢的 cpu 监视器（例如 htop）来仔细检查。

【讨论】：