在单词列表中查找单词索引答案

【问题标题】：Findind index of words in a list of words在单词列表中查找单词索引
【发布时间】：2021-02-15 07:29:35
【问题描述】：

对于 BIO 标记问题，我正在寻找一种方法来查找字符串列表中特定单词的索引。

例如：

text = "Britain has reduced its carbon emissions more than any rich country"
word = 'rich'
print(text.split())
['Britain', 'has', 'reduced', 'its', 'carbon', 'emissions', 'more', 'than', 'any', 'rich', 'country']

text.split(' ').index(word) # returns 9

text.split(' ').index('rich country') # occurring an error as expected

我想要的答案是：

[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

我想我可以使用循环来查找第一个单词的索引和最后一个单词的索引，然后将它们替换为 0 或 1。

但是我的问题是如果text 列表是这样的：

['Britain', 'has', 'reduced', 'its', 'carbon', 'emissions', 'more', 'than', 'any', 'rich', 'count', '_ry']

或许

['Britain', 'has', 'reduced', 'its', 'carbon', 'emissions', 'more', 'than', 'any', 'richcountry']

我相信我可以通过使用脏 for 循环来解决这个问题，但我相信会有另一种干净简单的方法来解决这个任务。

如果你们能就这个问题给我任何建议，我将不胜感激。

提前致谢！

【问题讨论】：

标签： python list nlp python-re

【解决方案1】：

回答你的第一个问题：

text = "Britain has reduced its carbon emissions more than any rich country"
words = 'rich country'.split(" ")
split_text = text.split()
[1 if x in words else 0 for x in split_text]

输出：

[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

第二个问题需要模糊匹配，可以通过fuzzywuzzy实现：

from fuzzywuzzy import process
words = 'rich country'.split(" ")
split_text = ['Britain', 'has', 'reduced', 'its', 'carbon', 'emissions', 'more', 'than', 'any', 'richcountry']
[1 if process.extractBests(x, words, score_cutoff = 60) else 0 for x in split_text]

输出：

[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

对于

split_text = ['Britain', 'has', 'reduced', 'its', 'carbon', 'emissions', 'more', 'than', 'any', 'rich', 'count', '_ry']
[1 if process.extractBests(x, words, score_cutoff = 60) else 0 for x in split_text]

输出：

[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]

请注意，您可以使用score_cutoff 设置阈值。

【讨论】：