【问题标题】:Match multiple strings by similarity or by dissimilarity (python)通过相似性或不相似性匹配多个字符串(python)
【发布时间】:2022-01-14 00:41:30
【问题描述】:

假设您有一个相同长度的字符串列表。您希望将每个字符串与 1 或 2 个最相似(在同一位置共享相同字符)或最不相似(不在同一位置共享字符)的其他字符串匹配

【问题讨论】:

    标签: python string comparison matching


    【解决方案1】:

    不是最有效的方法,但您可以从两个列表中获取匹配值,如下所示:

    >>> list_1 = ["hello", "world", "today is a good day", "have a nice day"]
    >>> list_2 = ["cats", "dogs", "today is a good day", "have a nice day"]
    >>> set(list_1) & set(list_2)
    {'today is a good day', 'have a nice day'}
    

    如果顺序很重要,您可以使用如下推导式:

    >>> list_1 = ["hello", "world", "today is a good day", "have a nice day"]
    >>> list_2 = ["cats", "dogs", "today is a good day", "have a nice day"]
    >>> print([i for i, j in zip(list_1, list_2) if i == j])
    ['today is a good day', 'have a nice day']
    

    【讨论】:

      【解决方案2】:

      这取决于您所说的“相似”是什么意思。我想说'abcdefg''gabcdef' 等两个字符串非常相似,但根据您的定义,它们完全不同

      这是实现您的想法的代码

      函数most_similar_index将列表中n个最相似字符串的索引返回给给定字符串

      import numpy as np
      
      def similarity(str1, str2):
          return sum([str1[i]==str2[i] for i in range(len(str1))])
      
      def most_similar_index(list_string, s, n):
          """
          list_string : list of all strings of same size
          s : string of same size as all of those in list_string
          n : number of indices to return
      
          returns indices of the n closest strings to the given string
          """
          
          temp_list = []
          for string in list_string:
              temp_list.append(similarity(s,string))
          temp_list = np.array(temp_list)
          
          return np.argsort(temp_list)[-1:-n-1:-1]
      

      结果:

      >>> list_string = ['abcde', 'abcdf', 'xbcde', 'xeeee', 'aeeef']
      >>> s = 'abcff'
      >>> most_similar_index(list_string, s, 3)
      array([1, 0, 4], dtype=int64)
      

      【讨论】:

        猜你喜欢
        • 2015-06-11
        • 1970-01-01
        • 2021-08-20
        • 1970-01-01
        • 2019-05-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多