电影相似度的深度卷积自动编码器答案

【问题标题】：Deep Convolutional Autoencoder for movie similarity电影相似度的深度卷积自动编码器
【发布时间】：2022-06-22 17:14:24
【问题描述】：

我是 python 新手，我有一个包含电影描述的数据集，我正在尝试创建一个可以根据这些描述计算电影相似度的模型。所以我首先将每个电影描述转换为一个 Word2Vec 向量，其中每个单词的大小为 100，因为我的数据集中最长的电影描述有 213 个单词，所以每个电影描述都转换为大小为 21300 的向量。现在我的下一步是使用卷积自动编码器来降低这些向量的维数。建议我将每个 21300 大小的向量转换为 150 x 142 矩阵，所以我这样做了，我的目标是将这些矩阵从 150 x 142 压缩到 5 x 5 矩阵，然后我将其展平并用于计算余弦不同压缩电影向量之间的相似性。现在这是我到目前为止的错误代码：

encoder_input = keras.Input(shape=(21300,), name='sum')
encoded= tf.keras.layers.Reshape((150,142),input_shape=(21300,))(encoder_input)
x = tf.keras.layers.Conv1D(32, 3, activation="relu", padding="same",input_shape=(16,150,142))(encoded)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(32, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(16, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(16, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(8, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x=tf.keras.layers.Flatten()(x)
encoder_output=keras.layers.Dense(units=25, activation='relu',name='encoder')(x)
x= tf.keras.layers.Reshape((5,5),input_shape=(25,))(encoder_output)

# Decoder

decoder_input=tf.keras.layers.Conv1D(8, 3, activation='relu', padding='same')(x)
x = tf.keras.layers.UpSampling1D(2)(decoder_input)
x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
#x=tf.keras.layers.Flatten()(x)
decoder_output = keras.layers.Conv1D(1, 3, activation='relu', padding='same')(x)




opt = tf.keras.optimizers.Adam(learning_rate=0.001, decay=1e-6)

autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')

autoencoder.compile(opt, loss='mse')
autoencoder.summary()

history = autoencoder.fit(
movies_vector,
movies_vector,
epochs=25

        )
   

print("ENCODER READY")
#USING THE MIDDLE LAYER 
encoder = keras.Model(inputs=autoencoder.input,
   outputs=autoencoder.get_layer('encoder').output)

运行此代码会产生以下错误：

ValueError: Dimensions must be equal, but are 100 and 21300 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](mean_squared_error/remove_squeezable_dimensions/Squeeze, IteratorGetNext:1)' with input shapes: [?,100], [?,21300].

我该如何修复这个自动编码器？

【问题讨论】：

标签： tensorflow keras deep-learning word2vec autoencoder

【解决方案1】：

我能够使用虚拟数据重现错误。如下更改解码器模型会有所帮助。

decoder_input=tf.keras.layers.Conv1D(8, 3, activation='relu', padding='same')(x)
x = tf.keras.layers.UpSampling1D(2)(decoder_input)
x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x=tf.keras.layers.Conv1D(213, 3, activation='relu', padding='same')(x)
decoder_output = tf.keras.layers.Flatten()(x)

请找到要点here。谢谢。

【讨论】：