【发布时间】:2020-07-20 17:55:41
【问题描述】:
我收集了有关某种疾病的报纸文章(未标记,只是原始文章)。我还有三组手动选择的与疾病相关的关键字,例如:phase-1、phase-2 等,如下所示。
phase_1 = ["symptoms","signs","fever","ache","vomit","blood","headache","fatigue","breath"]
phase_2 = ["pathogen","flavivirus","swamp","virus","contagious","mosquito bite","virus","agent","host"]
有没有办法使用 PYTHON 计算一组关键字和新闻文章之间的相似度?
【问题讨论】:
-
从谷歌搜索“余弦相似度 python”,顶部结果是sklearns cosine_similarity() function。也许这是一个很好的探索起点?
-
是的,我现在正在尝试修改它以满足我的需要。
标签: python nlp cosine-similarity sentence-similarity