【发布时间】:2018-12-21 12:49:06
【问题描述】:
我有非常简单的架构 lstm NN。经过几个 epoch 1-2 我的电脑完全死机了,我什至不能移动我的鼠标:
Layer (type) Output Shape Param #
=================================================================
lstm_4 (LSTM) (None, 128) 116224
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_5 (Dense) (None, 98) 12642
=================================================================
Total params: 128,866
Trainable params: 128,866
Non-trainable params: 0
# Same problem with 2 layers LSTM with dropout and Adam optimizer
SEQUENCE_LENGTH =3, len(chars) = 98
model = Sequential()
model.add(LSTM(128, input_shape = (SEQUENCE_LENGTH, len(chars))))
#model.add(Dropout(0.15))
#model.add(LSTM(128))
model.add(Dropout(0.10))
model.add(Dense(len(chars), activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = RMSprop(lr=0.01), metrics=['accuracy'])
这就是我的训练方式:
history = model.fit(X, y, validation_split=0.20, batch_size=128, epochs=10, shuffle=True,verbose=2).history
NN 需要 5 分钟才能完成 1 个 epoch。更大的批次大小并不意味着问题会更快发生。但是更复杂的模型可以训练更多的时间来达到几乎相同的精度——大约 0.46(完整代码here)
我有最新的 Linux Mint,1070ti,8GB,32Gb 内存
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:08:00.0 On | N/A |
| 0% 35C P8 10W / 180W | 303MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
图书馆:
Keras==2.2.0
Keras-Applications==1.0.2
Keras-Preprocessing==1.0.1
keras-sequential-ascii==0.1.1
keras-tqdm==2.0.1
tensorboard==1.8.0
tensorflow==1.0.1
tensorflow-gpu==1.8.0
我尝试过限制 GPU 内存的使用,但这不会成为问题,因为在训练期间它只消耗 1 GB 的 gpu 内存:
from keras.backend.tensorflow_backend
import set_session config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
config.gpu_options.allow_growth = True set_session(tf.Session(config=config))
这里有什么问题?我该如何解决这个问题?
【问题讨论】:
标签: python tensorflow crash keras deep-learning