巧合的字符串搜索？答案

【问题标题】：String search by coincidence?巧合的字符串搜索？
【发布时间】：2021-09-04 08:25:08
【问题描述】：

我只是想知道是否有一种简单的方法可以在 Python 中同时搜索一个字符串与另一个字符串。或者如果有人知道如何做到这一点。

为了说明清楚，我举个例子。

text_sample = "baguette is a french word"
words_to_match = ("baguete","wrd")

letters_to_match = ('b','a','g','u','t','e','w','r','d')   #   With just one 'e'
coincidences = sum(text_sample.count(x) for x in letters_to_match)

#    coincidences = 14 Current output
#    coincidences = 10 Expected output

我当前的方法将words_to_match 分解为单个字符，如letters_to_match，但随后匹配如下：“baguette is a fre nch word" (coincidences = 14)。

但我想获得 (coincidences = 10) 其中 "baguette 是法语 word”被认为是巧合。通过检查words_to_match与text_sample中的单词的相似度。

如何获得预期的输出？

【问题讨论】：

所以你只希望计数包括每个字符的第一次出现？但是在您的输出中，“e”是唯一被计算两次的字符。我不明白这里的逻辑
不，如果 text_sample 是“a baguette is a french word”，那么第一个 'a' 将被匹配为第一次出现，这不是我想要的。我希望通过检查 words_to_match 和 text_sample 中的单词之间的相似性来完成。
这听起来对我来说也很重要。是不是你在追赶edit distance 的方向？
我相信您可以在某处找到计算 Levenshtein 距离或其他测量技术之一的函数的 Python 实现（或自己实现其中一种）。
@Pomodor0 你可能还想看看difflib

标签： python string find-occurrences string-search multiple-occurrence

【解决方案1】：

首先，将 words_to_match 拆分为

    words = ''
    for item in words_to_match:
        words += item
    letters = [] # create a list
    for letter in words:
        letters.append(letter)
    letters = tuple(letters)

然后，看看它是否在里面

    x = 0
    for i in sample_text:
        if letters[x] == i:
            x += 1
            coincidence += 1

如果它不按顺序执行：

    for i in sample_text:
        if i in letters: coincidence += 1

（请注意，某些版本的 python 需要换行符）

【讨论】：

【解决方案2】：

您似乎需要最长公共子序列 (LCS) 的长度。请参阅the algorithm in the Wikipedia article 进行计算。您还可以找到一个可以快速计算它的 C 扩展。例如this search 有很多结果，包括pylcs。安装后（pip install pylcs）：

import pylcs
text_sample = "baguette is a french word"
words_to_match = ("baguete","wrd")
print(pylcs.lcs2(text_sample, ' '.join(words_to_match.join)))  #: 14

【讨论】：