您是在尝试构建关键字词典还是检索句子问题?
在这两种情况下,您都需要将问题与关键字相关联。
获取关键字的基本方法是将句子拆分为单词(使用 s.split())并用最常用的关键字列表更新...
difflib 可以在这里提供帮助。
由于我们不知道给定的文件架构,我假设它只是一个句子列表,而您在其他地方提供了关键字/问题(情况字典)。
例如:
csv_ret = ["I can't turn on my phone", "The screen won't turn on", "phone The display is blank"]
situations = {
"screen": ["turn on", "blank", "display", "screen"],
"battery": ["turn on", "phone"]
}
def get_situation_from_sentence(sentence):
occurences = {}
for word in sentence.split():
for key, value in situations.items():
if word in value:
if occurences.get(key) is None:
occurences[key] = [word]
elif word not in occurences.get(key):
occurences[key].append(word)
averages = {k: ((len(v) * 100) / len(situations[k])) for k, v in occurences.items()}
return "{}, {}".format(averages, sentence)
for sentence in csv_ret:
print(get_situation_from_sentence(sentence))
结果:
{'battery': 50.0},我无法开机
{'screen': 25.0},屏幕不亮
{'screen': 50.0, 'battery': 50.0}, phone 显示空白
此代码以百分比评估句子问题和相关关键字匹配。
再一次,这是一个非常基本的解决方案,您可能需要更强大的东西(词法分析器/解析器、机器学习......)但有时更简单更好:)