【问题标题】:Can anyone improve on the below Fuzzyfind function for VBA?任何人都可以改进 VBA 的以下 Fuzzyfind 功能吗?
【发布时间】:2015-06-17 21:04:00
【问题描述】:

此功能可让您从某个范围内查找相似的字符串,而无需进行精确搜索。

公式如下所示:=FuzzyFind(A1,B$1:B$20) 假设您要搜索的字符串在 A1 中 并且您的参考或选项表是 B1:B20

代码在这里:

Function FuzzyFind(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
For Each cell In tbl_array
  str = cell
  For i = 1 To Len(lookup_value)
    If InStr(cell, Mid(lookup_value, i, 1)) > 0 Then
      a = a + 1
      cell = Mid(cell, 1, InStr(cell, Mid(lookup_value, i, 1)) - 1) & Mid(cell, InStr(cell, Mid(lookup_value, i, 1)) + 1, 9999)
    End If
  Next i
  a = a - Len(cell)
  If a > b Then
    b = a
    Value = str
  End If
  a = 0
Next cell
FuzzyFind = Value
End Function

这个函数的结果是命中注定的。任何人都可以提高这个算法的智能吗?

谢谢你:)

【问题讨论】:

  • 正确地说,目标是找到包含最多特定值的单元格?示例 Cat,Battle ...当寻找 t 时,battle 有更多 ts,所以返回 Battle ?一切都在同一个案例中吗?还是我们需要担心这个
  • 一切都是大写的。你的例子是正确的假设我们要搜索的词是“apple”假设我们表中的字符串是袋子产品、洋葱、巨型桃子、电子产品和可下载应用程序,由于字符串“app”,它可能会返回电子产品和可下载应用程序连续匹配次数最多的
  • 我不明白这一行:a = a - Len(cell)
  • 我也对此感到困惑。我认为它计算我们的字符串和单元格中的值之间的匹配,然后修剪多余的。

标签: algorithm vba function find fuzzy-search


【解决方案1】:

我不确定“FuzzyFind”究竟意味着什么,但这是一个使用Levenshtein distance 查找类似数据的 VLOOKUP。

Levenshtein 距离让您可以选择一个可以指定的“百分比匹配”,而不是来自普通 VLOOKUP 的典型 TRUEFALSE

用法为:DTVLookup(A1,$C$1:$C$100,1,90),其中 90 是 Levenshtein 距离。

DTVLookup(Value To Find, Range to Search, Column to Return, [Percentage Match])

我通常在比较来自不同数据库的名称时使用它,例如:

Correct Name    Example Lookup  Percentage Match    Other Report
John S Smith    John Smith      83                  John Smith
Barb Jones      Barbara Jones   77                  Barbara Jones
Jeffrey Bridge  Jeff Bridge     79                  Jeff Bridge
Joseph Park     Joseph P. Park  79                  Joseph P. Park
Jefrey Jones    jefre jon       75                  jefre jon
Peter Bridge    peter f. bridge 80                  peter f. bridge

代码如下:

Function DTVLookup(TheValue As Variant, TheRange As Range, TheColumn As Long, Optional PercentageMatch As Double = 100) As Variant
If TheColumn < 1 Then
    DTVLookup = CVErr(xlErrValue)
    Exit Function
End If
If TheColumn > TheRange.Columns.Count Then
    DTVLookup = CVErr(xlErrRef)
    Exit Function
End If
Dim c As Range
For Each c In TheRange.Columns(1).Cells
    If UCase(TheValue) = UCase(c) Then
        DTVLookup = c.Offset(0, TheColumn - 1)
        Exit Function
    ElseIf PercentageMatch <> 100 Then
        If Levenshtein3(UCase(TheValue), UCase(c)) >= PercentageMatch Then
            DTVLookup = c.Offset(0, TheColumn - 1)
            Exit Function
        End If
    End If
Next c
DTVLookup = CVErr(xlErrNA)
End Function

Function Levenshtein3(ByVal string1 As String, ByVal string2 As String) As Long

Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long

string1_length = Len(string1):  string2_length = Len(string2)

distance(0, 0) = 0
For i = 1 To string1_length:    distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length:    distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
    For j = 1 To string2_length
        If smStr1(i) = smStr2(j) Then
            distance(i, j) = distance(i - 1, j - 1)
        Else
            min1 = distance(i - 1, j) + 1
            min2 = distance(i, j - 1) + 1
            min3 = distance(i - 1, j - 1) + 1
            If min2 < min1 Then
                If min2 < min3 Then minmin = min2 Else minmin = min3
            Else
                If min1 < min3 Then minmin = min1 Else minmin = min3
            End If
            distance(i, j) = minmin
        End If
    Next
Next

' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein3 = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)

End Function

【讨论】:

    【解决方案2】:

    试试这个,我想它会找到最好的匹配

    Function FuzzyFind2(lookup_value As String, tbl_array As Range) As String
    Dim i As Integer, str As String, Value As String
    Dim a As Integer, b As Integer, cell As Variant
    Dim Found As Boolean
    
    b = 0
    For Each cell In tbl_array
      str = cell
      i = 1
      Found = True
      Do While Found = True
        Found = False
        If InStr(i, str, lookup_value) > 0 Then
            a = a + 1
            Found = True
            i = InStr(i, str, lookup_value) + 1
        End If
      Loop
    
      If a > b Then
        b = a
        Value = str
      End If
      a = 0
    Next cell
    FuzzyFind2 = Value
    End Function
    

    【讨论】:

    • 在原版的基础上有何改进?你能简单介绍一下吗?
    • @JohnsonJason 当我使用原来的那个时,它几乎总是带着它找到的第一个回来。这一个与最多同意的那个一起回来,我认为上面的答案可能更好
    • @ Holmes IV 我将其切换为最佳答案,因为它返回的结果比 Levenshtein 方法更好。尽管对于他上面给出的示例,比较数据库中的名称,但我认为 Levenshtein 方法是最好的。但是,在我的情况下,您的模糊查找 2 摇滚。谢谢你:)
    猜你喜欢
    • 2020-12-01
    • 1970-01-01
    • 2015-05-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多