使用 df 您的数据框,您可以执行以下操作:
from collections import defaultdict
inv_index = defaultdict(list)
for doc, words in zip(
df.document,
df.content.str.findall(r"\w+").map(set)
):
for word in words:
inv_index[word].append(doc)
结果 - inv_index - 用于
df =
document content
0 Ancient Egypt Ancient Egypt was a civilization of ancient North Africa ,...
1 Nile River The Nile is a major north flowing river in northeastern Africa
是
{
'Africa': ['Ancient Egypt', 'Nile River'],
'Ancient': ['Ancient Egypt'],
'Egypt': ['Ancient Egypt'],
'Nile': ['Nile River'],
'North': ['Ancient Egypt'],
'The': ['Nile River'],
'a': ['Ancient Egypt', 'Nile River'],
'ancient': ['Ancient Egypt'],
'civilization': ['Ancient Egypt'],
'flowing': ['Nile River'],
'in': ['Nile River'],
'is': ['Nile River'],
'major': ['Nile River'],
'north': ['Nile River'],
'northeastern': ['Nile River'],
'of': ['Ancient Egypt'],
'river': ['Nile River'],
'was': ['Ancient Egypt']
}