【问题标题】:Inverted Index using a dataframe Python使用数据框 Python 的倒排索引
【发布时间】:2021-10-28 20:27:33
【问题描述】:

我有一个如下的数据框

document content
Ancient Egypt Ancient Egypt was a civilization of ancient North Africa,...
Nile River The Nile is a major north flowing river in northeastern Africa...

我需要创建一个倒排索引系统来为我提供单词及其特定的文档名称。

举个例子

{'a': ['Ancient Egypt'],
 'Egypt': ['Ancient Egypt'],
 'is': ['Ancient Egypt  ', 'Nile River']}

【问题讨论】:

    标签: python dataframe inverted-index


    【解决方案1】:

    使用 df 您的数据框,您可以执行以下操作:

    from collections import defaultdict
    
    inv_index = defaultdict(list)
    for doc, words in zip(
            df.document,
            df.content.str.findall(r"\w+").map(set)
        ):
        for word in words:
            inv_index[word].append(doc)
    

    结果 - inv_index - 用于

    df =
            document                                                         content
    0  Ancient Egypt  Ancient Egypt was a civilization of ancient North Africa ,...
    1     Nile River  The Nile is a major north flowing river in northeastern Africa
    

    {
        'Africa': ['Ancient Egypt', 'Nile River'],
        'Ancient': ['Ancient Egypt'],
        'Egypt': ['Ancient Egypt'],
        'Nile': ['Nile River'],
        'North': ['Ancient Egypt'],
        'The': ['Nile River'],
        'a': ['Ancient Egypt', 'Nile River'],
        'ancient': ['Ancient Egypt'],
        'civilization': ['Ancient Egypt'],
        'flowing': ['Nile River'],
        'in': ['Nile River'],
        'is': ['Nile River'],
        'major': ['Nile River'],
        'north': ['Nile River'],
        'northeastern': ['Nile River'],
        'of': ['Ancient Egypt'],
        'river': ['Nile River'],
        'was': ['Ancient Egypt']
    }
    

    【讨论】:

    猜你喜欢
    • 2012-03-16
    • 2012-03-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-12-07
    • 1970-01-01
    相关资源
    最近更新 更多