【问题标题】:BERT Summarization for a column of texts一列文本的BERT摘要
【发布时间】:2021-07-01 15:24:33
【问题描述】:

我正在尝试从数据集中总结 100 个产品描述。为此,我只是尝试使用总结器

!pip install summarizers -q

from summarizers import Summarizers 

import pandas as pd

一次一个句子效果很好。

textpanasonic="The NN-CS89L offers next-level cooking convenience. Its four distinct cooking methods - steaming, baking, grilling and microwaving ensure your meals are cooked or reheated to perfection. Its multi-function capabilities can be combined to save time without compromising taste, texture or nutritional value. It’s the all-in-one kitchen companion designed for people with a busy lifestyle."



summ(textpanasonic)

The NN-CS89L offers next-level cooking convenience.

但是你知道是否可以为每条评论创建一个带有摘要的新列吗?

ValueError:文本输入必须是str(单个示例)、List[str](批量或单个预标记示例)或List[List[str]](预标记示例)类型。

提前谢谢你^^

【问题讨论】:

    标签: python nlp bert-language-model summarize


    【解决方案1】:

    您可以简单地 apply summ 到带有摘要的列。由于summ 可以将列表作为输入,它还将处理熊猫系列。提供多行示例:

    import pandas as pd
    from summarizers import Summarizers
    summ = Summarizers()
    
    data = ["The NN-CS89L offers next-level cooking convenience. Its four distinct cooking methods - steaming, baking, grilling and microwaving ensure your meals are cooked or reheated to perfection. Its multi-function capabilities can be combined to save time without compromising taste, texture or nutritional value. It’s the all-in-one kitchen companion designed for people with a busy lifestyle.", "These slim and stylish bodies are packed with high performance. The attractive compact designs and energy-saving functions help Panasonic Blu-ray products consume as little power as possible. You can experience great movie quality with this ultra-fast booting DMP-BD89 Full HD Blu-ray disc player. After starting the player, the time it takes from launching the menu to playing a disc is much shorter than in conventional models. The BD89 also allows for smart home networking (DLNA) and provides access to video on demand, so that home entertainment is more intuitive, more comfortable, and lots more fun."]
    df = pd.DataFrame(data, columns=['summaries'])
    df['abstracts'] = df['summaries'].apply(summ)
    
    summaries abstracts
    0 The NN-CS89L offers next-level cooking convenience. Its four distinct cooking methods - steaming, baking, grilling and microwaving ensure your meals are cooked or reheated to perfection. Its multi-function capabilities can be combined to save time without compromising taste, texture or nutritional value. It’s the all-in-one kitchen companion designed for people with a busy lifestyle. The NN-CS89L offers next-level cooking convenience.
    1 These slim and stylish bodies are packed with high performance. The attractive compact designs and energy-saving functions help Panasonic Blu-ray products consume as little power as possible. You can experience great movie quality with this ultra-fast booting DMP-BD89 Full HD Blu-ray disc player. After starting the player, the time it takes from launching the menu to playing a disc is much shorter than in conventional models. The BD89 also allows for smart home networking (DLNA) and provides access to video on demand, so that home entertainment is more intuitive, more comfortable, and lots more fun. Panasonic DMP-BD89 Full HD Blu-ray disc player.

    【讨论】:

    • 嗨!非常感谢你!另外我认为我对 pandas 还不是很好,因为我使用的是 csv 文件,结果是 summ(df2[['IDPTexteEn']]) "ValueError: text input must of type str (single example), @987654329 @(批量或单个预标记示例)或List[List[str]](批量预标记示例)。” (因为 IDPTexteEn 是 cmets 的列名,而 df2['IDPTexteEn'] = df2['IDPTexteEn'].astype('str') 不会改变任何东西 ^^" )
    • 您无法将数据框提供给函数summ。正如我的回答中提到的,尝试将summ 应用于系列(即列):df2['IDPTexteEn'].apply(summ)
    • 啊,是的,你是对的!非常感谢大佬T^T
    • Hiii 你知道我是否可以检查两列的相似之处吗? ^^"
    • 你是指摘要和摘要之间的相似性吗?有几个模块,比如this one,可以对句子对进行语义相似性分析。顺便说一句,如果对你有帮助,请accept我的回答。
    猜你喜欢
    • 2023-01-05
    • 1970-01-01
    • 1970-01-01
    • 2021-12-30
    • 1970-01-01
    • 1970-01-01
    • 2014-04-14
    • 2012-12-07
    • 2016-07-04
    相关资源
    最近更新 更多