RuntimeError：Dst 张量未在 Tensorflow 中初始化答案

【问题标题】：RuntimeError: Dst tensor is not initialized in TensorflowRuntimeError：Dst 张量未在 Tensorflow 中初始化
【发布时间】：2020-06-08 05:35:03
【问题描述】：

我明白了

RuntimeError: Dst 张量未初始化

在我的神经网络训练期间。具体来说，似乎在我的自定义Callback 中出现错误时，我使用self.model.predict(self.dataset) 得到预测，因为堆栈跟踪显示

文件“mlp_keras.py”，第 20 行，在 on_epoch_end 预测 = self.model.predict(self.dataset)

这是完整的堆栈跟踪：

Traceback (most recent call last):
  File "mlp_keras.py", line 150, in <module>
    callbacks=[KendallTauHistory(training_dataset, training_dataset_labels, groups_id_count)])
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit
    prefix='val_')
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 771, in on_epoch
    self.callbacks.on_epoch_end(epoch, epoch_logs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/callbacks.py", line 302, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "mlp_keras.py", line 20, in on_epoch_end
    predictions = self.model.predict(self.dataset)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1013, in predict
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 498, in predict
    workers=workers, use_multiprocessing=use_multiprocessing, **kwargs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 426, in _model_iteration
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 357, in __init__
    dataset = self.slice_inputs(indices_dataset, inputs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 383, in slice_inputs
    dataset_ops.DatasetV2.from_tensors(inputs).repeat()
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 566, in from_tensors
    return TensorDataset(tensors)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2765, in __init__
    element = structure.normalize_element(element)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/util/structure.py", line 113, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i))
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
    allow_broadcast=True)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: Dst tensor is not initialized.

这是我的代码：

class KendallTauHistory(Callback):
      def __init__(self, dataset, y_true, groups):
        self.y_true = y_true
        self.dataset = dataset
        self.groups = groups

      def on_epoch_end(self, epoch, logs=None):
        predictions = self.model.predict(self.dataset)
        predictions = predictions.flatten()
        predictions = list(map(lambda element: element + np.random.uniform(0.0, 1.0) * 0.02 - 0.01, predictions))
        # For batch training
        ranked_predictions = np.array([])
        kendalls = np.array([])
        start_range = 0
        for group in self.groups:
            end_range = (start_range + group[1]) # Batch is a group of words with same group id
            batch_predictions = predictions[start_range:end_range]
            batch_labels = self.y_true[start_range:end_range]
            batch_predictions = list(map(lambda element: element + np.random.uniform(0.0, 1.0) * 0.02 - 0.01, batch_predictions))
            ranked_predictions = np.append(ranked_predictions, np.floor(rankdata(batch_predictions)))
            kendalls = np.append(kendalls, kendalltau(batch_labels, batch_predictions))
            start_range = end_range
        #self.y_true = self.y_true[0:len(ranked_predictions)]
        print('\nORIGINAL LABELS: {0}\n'.format(self.y_true))
        print('PREDICTED LABELS: {0}'.format(ranked_predictions))
        print("\nEpoch Kendall's tau: {0}".format(np.mean(kendalls)))


    model = tf.keras.Sequential()
    model.add(LSTM(units=10, batch_input_shape=(None, 2, 839)))
    model.add(Dense(15, activation='sigmoid'))

    model.summary()

    model.compile(loss=listnet_loss, optimizer=keras.optimizers.Nadam(learning_rate=0.000005, beta_1=0.9, beta_2=0.999))
    real_labels = np.array([])
    losses = np.array([])

    with tf.device('/GPU:0'):
      model.fit(training_dataset, training_dataset_labels, epochs=10, workers=10,
                verbose=1, callbacks=[KendallTauHistory(training_dataset, training_dataset_labels, groups_id_count)])

【问题讨论】：

这是正确的做事方式吗？模型的编译不应该在 tf.device('/GPU:0') 内完成，而 fit 不必这样做？你解决了这个问题吗？

标签： python tensorflow machine-learning keras gpu

【解决方案1】：

通常，此错误源于您的 GPU 在尝试分配张量时内存不足。事实上，一周前我在使用 CUDA 10.0 和 TensorFlow 1.13.0 的多 GPU 环境中进行训练时遇到了同样的错误。

我的建议是减少 .fit() 方法中的 batch_size 参数。如果您没有明确定义它，则将其设置为 32。逐渐减少 2 倍以使您的错误消失。

您还可以在此链接中阅读有关此错误的信息：https://github.com/tensorflow/tensorflow/issues/7025

（获取“Dst 张量未初始化。”真正的问题是 GPU内存不足）

该错误可能具有误导性，因为它与 TensorFlow 出现内存分配问题时常见的“OOM 错误”不一致。

【讨论】：

谢谢，我注意到我发布了一个未更新的代码：我当前的batch_size 是 2。
请同时更新到 Python 至少 3.6；另外，你能告诉我你用的是什么GPU吗
Python3 可能是个问题，因为我在远程机器上并且我认为 pip3 没有安装。 GPU 是 Tesla P100
如果您仔细阅读我提供的链接中的堆栈跟踪，此错误只会因内存问题而出现。
在你的 self.model.predict(...., batch_size = 2) 中。看看它是否像这样工作。