Mel Spectrogram 特征提取到 CNN答案

【问题标题】：Mel Spectrogram feature extraction to CNNMel Spectrogram 特征提取到 CNN
【发布时间】：2020-12-16 18:08:21
【问题描述】：

这个问题与here 发布的问题一致，但与 CNN 略有不同。使用特征提取定义：

max_pad_len = 174
n_mels = 128

def extract_features(file_name):
    try:
        audio, sample_rate = librosa.core.load(file_name, res_type='kaiser_fast')
        mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels)
        #pad_width = max_pad_len - mely.shape[1]
        #mely = np.pad(mely, pad_width=((0, 0), (0, pad_width)), mode='constant')

    except Exception as e:
        print("Error encountered while parsing file: ", file_name)
        return None

    return mely

如何将num_rows、num_columns 和num_channels 的正确维度输入到训练和测试数据中？

在构建 CNN 模型时，如何确定要输入的正确形状？

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, input_shape=(num_rows, num_columns, num_channels), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

【问题讨论】：

您使用的是顺序 keras 格式...如果您不指定任何输入形状？
@MarcoCerliani，我不明白你的回复。你能详细说明一下吗？
model.add(Conv2D(filters=16, kernel_size=2)) 而不是 model.add(Conv2D(filters=16, kernel_size=2, input_shape=(num_rows, num_columns, num_channels))
感谢您的回复！在您这里提到的解决方案中，网络会自动扩展？...意思是，会自动生成正确的维度吗？
是的...你只需要传递相同维度的数据

标签： machine-learning keras conv-neural-network librosa spectrogram

【解决方案1】：

我不知道这是否正是您的问题，但我还必须使用 MEL 作为 CNN 的输入。

简答：

input_shape = (x_train.shape[1], x_train.shape[2], 1)
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)

或

x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
input_shape = x_train.shape[1:]

长答案

在我的例子中，我有一个带有 speaker_id 和 mel 频谱图（之前使用 librosa 计算）的 DataFrame。

Keras CNN 模型适用于具有宽度、高度和颜色通道（灰度 - RGB）的图像

librosa 给出的 Mel Spectrograms 是具有宽度和高度的图像状数组，因此您需要进行 reshape 以添加通道维度。

定义输入和预期输出

# It looks stupid but that way i could convert the panda.Series to a np.array
x = np.array(list(df.mel)) 
y = df.speaker_id
print('X shape:', x.shape)

X 形：(2204, 128, 24)
2204 梅尔斯，128x24

在训练测试中拆分

x_train, x_test, y_train, y_test = train_test_split(x, y)
print(f'Train: {len(x_train)}', f'Test: {len(x_test)}')

火车：1653 测试：551

重塑形状以添加额外的尺寸

x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
print('Shapes:', x_train.shape, x_test.shape)

形状：(1653, 128, 24, 1) (551, 128, 24, 1)

设置输入形状

# The input shape is independent of the amount of inputs
input_shape = x_train.shape[1:]
print('Input shape:', input_shape)

输入形状：(128, 24, 1)

放入模型中

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D())
# More layers...
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])

运行模型

model.fit(x_train, y_train, epochs=20, validation_data=(x_test, y_test))

希望对你有帮助

【讨论】：