将输入张量从 CPU 复制到 GPU 以运行 GatherVe 失败：Dst 张量未初始化。 [操作：GatherV2]答案

【问题标题】：Failed copying input tensor from CPU to GPU in order to run GatherVe: Dst tensor is not initialized. [Op:GatherV2]将输入张量从 CPU 复制到 GPU 以运行 GatherVe 失败：Dst 张量未初始化。 [操作：GatherV2]
【发布时间】：2020-07-15 14:11:50
【问题描述】：

    from random import sample
    index=sample(range(0, len(result)), len(result)//5*4)
    description_train=[child[0] for i, child in enumerate(result) if i in index]
    ipc_train=[child[1] for i, child in enumerate(result) if i in index]
    description_test=[child[0] for i, child in enumerate(result) if i not in index]
    ipc_test=[child[1] for i, child in enumerate(result) if i not in index]
    
    import numpy as np
    
    def to_onehot(li):
        result=np.zeros(8)
        if 'A' in li:
            result[0]=1
        if 'B' in li:
            result[1]=1
        if 'C' in li:
            result[2]=1
        if 'D' in li:
            result[3]=1
        if 'E' in li:
            result[4]=1
        if 'F' in li:
            result[5]=1
        if 'G' in li:
            result[6]=1
        if 'H' in li:
            result[7]=1
        return result
            
            
    
    from tensorflow.python.keras.preprocessing.text import Tokenizer
    
    
    max_words=100000
    num_classes=8
    
    t=Tokenizer(num_words=max_words)
    t.fit_on_texts(description_train)
    X_train=t.texts_to_matrix(description_train, mode='binary')
    X_test=t.texts_to_matrix(description_test, mode='binary')
    Y_train=np.array([to_onehot(child) for child in ipc_train], dtype=np.int32)
    Y_test=np.array([to_onehot(child) for child in ipc_test], dtype=np.int32)
    
    
    from tensorflow.python.keras.models import Sequential
    from tensorflow.python.keras.layers import Dense, Dropout
    
    
    model = Sequential()
    model.add(Dense(1024, input_shape=(max_words,), activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(num_classes, activation='sigmoid'))
    
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(X_train, Y_train, batch_size=128, epochs=5, validation_split=0.1)

最后一行 (model.fit) 导致以下错误。

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run GatherV2: Dst tensor is not initialized. [Op:GatherV2]

我该如何解决？提前谢谢你。

【问题讨论】：

标签： tensorflow2.0

【解决方案1】：

我找到了解决方案。我将样本数量减少了

model.fit(X_train[0:3000], Y_train[0:3000], batch_size=128, epochs=5, validation_split=0.1)

然后，错误消失了。

祝大家好运。

【讨论】：

但是剩下的样本呢？你不适合模型吗？

【解决方案2】：

可能是因为内存不足。您可以在具有大量 RAM 的系统上运行代码，或者通过 PCA 或特征选择等方法减少样本数量或减少数据维度。

【讨论】：

【解决方案3】：

减少样本量并不总是一种选择，因为为什么你首先会有这么多样本，因此，我会推荐几个选项：

使用具有更高规格的云虚拟机（AWS、Azure 或 GCP）并按小时付费，然后在这个虚拟机上完成任务
如果你不想付费，并且可以编写额外的代码，那么基本上，你必须创建自己的自定义生成器来调用 flow_from_directory 来批量加载数据集。参考这个： https://www.askpython.com/python/examples/handling-large-datasets-machine-learning https://www.analyticsvidhya.com/blog/2020/08/image-augmentation-on-the-fly-using-keras-imagedatagenerator/

【讨论】：

【解决方案4】：

一种解决方案是减小输入图像的大小以适应 GPU 的容量。对我来说，我从 (224,224,10) 减少到 (128,128,10)。

【讨论】：

【解决方案5】：

我经常遇到这个错误，即使是高 RAM EC2 实例也是如此。对我来说唯一的解决方案是使用生成器：

from tensorflow.keras.utils import Sequence
import numpy as np   

class DataGenerator(Sequence):
    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size

    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
        return batch_x, batch_y

train_gen = DataGenerator(X_train, y_train, 32)
test_gen = DataGenerator(X_test, y_test, 32)


history = model.fit(train_gen,
                    epochs=6,
                    validation_data=test_gen)

在上面的例子中，我们假设X和y是numpy数组。

我对正在发生的事情的猜测：即使我使用的是高 RAM 实例，我怀疑问题是 GPU 内存的限制，即使我正在批量训练，当不使用生成器时，TensorFlow 是试图将整个数组加载到 GPU 内存中。

【讨论】：