【发布时间】:2021-12-20 06:47:03
【问题描述】:
我在计算 fastText 生成的嵌入的余弦相似度时遇到以下错误:
/home/kgarg8/anaconda3/envs/CiteKP/lib/python3.6/site-packages/scipy/spatial/distance.py:721: RuntimeWarn
ing: invalid value encountered in float_scalars
dist = 1.0 - uv / np.sqrt(uu * vv)
相关代码sn-ps:
# fastText supervised training:
model = fasttext.train_supervised('merged_data_labels_prepended.txt')
model.save_model('fasttext_supervised.bin')
# model loading
model = fasttext.load_model("fasttext_supervised.bin")
# calculating cosine similarity
from scipy import spatial
def cosine_distance_wordembedding_method(s1, s2):
vec1 = np.mean([model[word] for word in s1],axis=0)
vec2 = np.mean([model[word] for word in s2],axis=0)
cosine = spatial.distance.cosine(vec1, vec2)
return round((1-cosine)*100, 2)
cosine_distance_wordembedding_method(pred.split(), label.split()) # function call
初步分析:
fastText 正在为不在词汇表中的单词生成全零嵌入(vec1 或 vec2 有时为零)。那么,如何处理这些 OOV 词以获得非零嵌入?
【问题讨论】:
标签: python spatial word-embedding fasttext