预期 lstm_1 的形状为 (20, 256) 但得到的数组形状为 (1, 76)答案

【问题标题】：Expected lstm_1 to have shape (20, 256) but got array with shape (1, 76)预期 lstm_1 的形状为 (20, 256) 但得到的数组形状为 (1, 76)
【发布时间】：2020-03-28 17:03:52
【问题描述】：

我正在构建一个用于说话人识别的神经网络，但我遇到了尺寸问题，我一定是在批处理生成器中做错了什么，但我不知道是什么。我的步骤如下。首先我准备标签：

labels = []
with open('filtered_files.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for file in reader:
        label = file[0]
        if label not in labels:
            labels.append(label)
print(labels)

然后我声明batch_generator：

n_features = 20
max_length = 1000
n_classes = len(labels)

def batch_generator(data, batch_size=16):
    while 1:
        random.shuffle(data)
        X, y = [], []
        for i in range(batch_size):
            print(i)
            wav = data[i]
            waves, sr = librosa.load(wav, mono=True)
            print(waves)
            filename = wav.split('\\')[1]
            filename = filename.split('.')[0] + ".mp3"
            filename = filename.split('_', 1)[1]
            print(filename)
            with open('filtered_files.csv', 'r') as csvfile:
                reader = csv.reader(csvfile)
                for file in reader:
                    if filename == file[1]:
                        print(file[0])
                        label = file[0]
                        break
                    else:
                        continue

            y.append(one_hot_encode(["'" + label + "'"]))
            mfcc = librosa.feature.mfcc(waves, sr)
            mfcc = np.pad(mfcc, ((0,0), (0, max_length - len(mfcc[0]))), mode='constant', constant_values=0)
            X.append(np.array(mfcc))
        yield np.array(X), np.array(y)

最后，我有了神经网络声明，我开始了训练过程：

learning_rate = 0.001
        batch_size = 64
        n_epochs = 50
        dropout = 0.5

        input_shape = (n_features, max_length)
        steps_per_epoch = 50
        model = Sequential()
        model.add(LSTM(256, return_sequences=True, input_shape=input_shape,
                       dropout=dropout))
        # model.add(Flatten())
        # model.add(Dense(128, activation='relu'))
        # model.add(Dropout(dropout))
        # model.add(Dense(n_classes, activation='softmax'))

        opt = Adam(lr=learning_rate)
        model.compile(loss='categorical_crossentropy', optimizer=opt,
        metrics=['accuracy'])
        model.summary()

        history = model.fit_generator(
            generator=batch_generator(X_train, batch_size),
            steps_per_epoch=steps_per_epoch,
            epochs=n_epochs,
            verbose=1,
            validation_data=batch_generator(X_val, 32),
            validation_steps=5,
            callbacks=callbacks
        )

我放了很多代码，因为我不确定哪个部分实际上可能导致错误的尺寸。第一层的格式存在以下问题： ,,检查目标时出错：预期 lstm_1 的形状为 (20, 256) 但得到的数组形状为 (1, 76)"

如果我取消注释第二层，我会收到： ,,检查目标时出错：预期 flatten_1 有 2 个维度，但得到的数组形状为 (64, 1, 76)"

【问题讨论】：

该问题与数据集维度和模型输入形状不匹配有关

标签： neural-network voice-recognition voice speaker

【解决方案1】：

模型 inputShape 和数据集形状之间存在形状不匹配。如错误所示，数据集的形状为 (1, 76)，而模型预期的形状为 (20, 256) (input_shape = (n_features, max_length))。

要解决此问题，要么更改模型 inputShape 以匹配数据集，要么处理数据集以匹配模型 inputShape。

input_shape = (20, 256)
model = Sequential()
model.add(LSTM(256, return_sequences=True, input_shape=input_shape,
                dropout=dropout))
model.add(Flatten())
# model.add(Dense(128, activation='relu'))
# model.add(Dropout(dropout))
model.add(Dense(2, activation='softmax'))

opt = Adam(lr=learning_rate)
model.compile(loss='categorical_crossentropy', optimizer=opt,
metrics=['accuracy'])
model.summary()

model.fit(tf.ones([1, 20, 256]), tf.one_hot([0, 1], 2)) // an example of training

【讨论】：