【问题标题】:How to find common words or sentences or paragraphs ,from multiple paragraphs [closed]如何从多个段落中找到常用的单词或句子或段落[关闭]
【发布时间】:2021-08-30 08:44:38
【问题描述】:

我有以下示例段落:

para1 = "this is para one. I am cat. I am 10 years old. I like fish"
para2 = "this is para two. I am dog. my age is 12. I can swim"
para3 = "this is para three. I am cat. I am 9 years. I like rat"
para4 = "this is para four. I am rat. my age is secret. I hate cat"
para5 = "this is para five. I am dog. I am 10 years old. I like fish"

需要如下结果:

this is para

I am

I 

我试过python的SET数据类型,但效果并不理想。

是否有二进制可执行程序允许我构建命令行来完成我的任务?

【问题讨论】:

  • 请不要在您的帖子中添加随机标签。这只不过是垃圾邮件。
  • 这个问题虽然描述得不是很好,但也不是求推荐书,也不是求书,用什么论据来结束它?它只是要求一个识别常见前缀的python代码。
  • 这是一个使用os.path.commonprefix 的单行解决方案:首先是import os,然后是[os.path.commonprefix(sentences) for sentences in zip(*[p.split('.') for p in [para1,para2,para3,para4,para5]])]。它返回列表['this is para ', ' I am ', ' ', ' I ']

标签: python algorithm nlp


【解决方案1】:

您好,您可以执行以下操作

paragraph_lst = ["this is para one. I am cat. I am 10 years old. I like fish",
                     "this is para two. I am dog. my age is 12. I can swim",
                     "this is para three. I am cat. I am 9 years. I like rat",
                     "this is para four. I am rat. my age is secret. I hate cat",
                     "this is para five. I am dog. I am 10 years old. I like fish"]
    
    word_combinations = set()
    
    
    def get_combinations(line1, line2, first=0, last=1, prvs_wrd=""):
        line_lst = line1.split(" ")
        if last > len(line_lst):
            return
        chk_list = line_lst[first:last]
        wrd = " ".join(str(x) for x in chk_list)
        if wrd in line2:
            prvs_wrd = wrd
            get_combinations(line1, line2, first, last + 1, prvs_wrd)
        else:
            word_combinations.add(prvs_wrd)
            get_combinations(line1, line2, last, last + 1, prvs_wrd)
    
    
    if __name__ == '__main__':
        for n, line in enumerate(paragraph_lst):
            if n + 1 < len(paragraph_lst):
                str1 = paragraph_lst[n]
                str2 = paragraph_lst[n + 1]
                get_combinations(str1, str2)
        print(word_combinations)

因此设置的 word_combinations 将给出以下结果

{'I', 'I am', 'is', 'this is para'}

【讨论】:

猜你喜欢
  • 2020-08-18
  • 2022-10-15
  • 2021-11-01
  • 2016-11-13
  • 2021-09-11
  • 1970-01-01
  • 2019-09-03
  • 1970-01-01
  • 2012-10-21
相关资源
最近更新 更多