【发布时间】:2021-05-16 01:15:07
【问题描述】:
enter image description hereenter image description here我正在做 Python 项目,我正在使用 Word2Vec 来推荐产品。 该代码对于包含 19401 的数据集非常有效,但是每当我传递产品的 id 时,我都会得到 此错误“keyerror : word '1077' not in words” 我不知道如何解决这个问题,因为我对此知之甚少,我还在学习。请帮我解决这个问题!
purchases_train = []
for i in tqdm(product_train):
temp = train_df[train_df["Clothing ID"] == i]["Review Text"].tolist()
purchases_train.append(temp)
purchases_val = []
for i in tqdm(validation_df['Clothing ID'].unique()):
temp = validation_df[validation_df["Clothing ID"] == i]["Review Text"].tolist()
purchases_val.append(temp)
model = Word2Vec(window = 10, sg = 1, hs = 0,
negative = 10, # for negative sampling
alpha=0.03, min_count= 1 , min_alpha=0.0007,
seed = 14)
model.build_vocab(purchases_train, progress_per=200)
model.train(purchases_train, total_examples = model.corpus_count,
epochs=10, report_delay=1)
# save word2vec model
model.save("word2vec_2.model")
model.init_sims(replace=True)
# extract all vectors
X = model[model.wv.vocab]
products = train_df[["Clothing ID", "Review Text"]]
# remove duplicates
products.drop_duplicates(inplace=True, subset='Clothing ID', keep="last")
# create product-ID and product-description dictionary
products_dict = products.groupby('Clothing ID')['Review Text'].apply(list).to_dict()
def similar_products(v, n = 6):
# extract most similar products for the input vector
ms = model.similar_by_vector(v, topn= n+1)[1:]
# extract name and similarity score of the similar products
new_ms = []
for j in ms:
pair = (products_dict[j[0]][0], j[1])
new_ms.append(pair)
return new_ms
similar_products(model['1077'])
【问题讨论】:
-
请发布错误的整个追溯,以及您正在处理的示例数据。