【问题标题】:Remove Duplicate value in single cell in Excel删除Excel中单个单元格中的重复值
【发布时间】:2021-01-09 06:04:51
【问题描述】:

我在一个 Excel 单元格中有以下文本字符串:

AucklandAucklandarea
or 
WellingtonWellingtonarea

很明显,单词词组或字符串包含三个没有任何分隔符(空格、逗号、精原细胞)的单词,如Auckland, Auckland, area;只需大写字母和小写字母就可以识别出单词的含义。

我的问题是如何从单个单元格中删除重复的单词或保持单词的唯一性。我有一列显示如上所述的数据。我已经检查了一些关于 Excel VBA 的解决方案,但是这些解决方案提供了很好的想法,但不适合我的情况。 否则,我必须逐项替换它们。

【问题讨论】:

  • It is clear to see that the word include three words without any delimiter 你很清楚,但 AI 不明白。即使我们匹配重复的单词,False Positives 也会有问题,例如NorThisOrThatOrTh 在这里重复。同样NorThisOrBlah。这里Or 重复了..
  • 不仅如此,它还依赖于一个对语言有足够好的感觉的人来识别单词。当您实际上可以将单个单词分成其他有效的单独单词时会发生什么?闻起来有点 x,y 问题。是否为您提供了非平面文件?
  • 可以多于三个字吗?一个字符串可以出现两次以上吗?结果应该是什么样子,例如AucklandareaAuckland area?
  • @VBasic2008 应该是奥克兰地区。不过谢谢,你的回答很好。

标签: excel vba


【解决方案1】:

删除大写重复

  • 在字符串中,删除以大写字母开头的重复子字符串。
  • 您可以在Excel 中将CapsNoDupes 用作UDF,例如=CapsNoDupes(A1)
Option Explicit

' Select some cells (ranges) and run this to apply the changes.
' To use a non-contiguous range, e.g. select 'A1:A5' then press and hold 'CTRL'
' and select 'C2' and select 'D4:E6' etc.
Sub TESTremoveCapDupesSelection()
    If TypeName(Selection) = "Range" Then
        Dim rng As Range: Set rng = Selection
        removeCapDupesInRange rng
    End If
End Sub

' A reminder that the range can be non-contiguous.
Sub TESTremoveCapDupesInRange()
    Dim rng As Range: Set rng = Range("A1,B2,C4")
    removeCapDupesInRange rng
End Sub

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose:      Utilizes the 'CapsNoDupes' function in a range.
' Remarks:      The range can be non-contiguous. For a contiguous range,
'               it can be written using an array which would increase
'               efficiency.
'               The capFirst argument determines if the first character
'               of the initial string is to be capitalized. Default is 'True'.
' FLow:         It loops through each cell of each area of the specified range.
'               It checks if the current cell does not contain an error value
'               and if it is of type string. If so, it calls the 'CapsNoDupes'
'               function which will possibly modify the cell value (in place).
' Precedents:   'CapsNoDupes', 'UniqueCapsToArray', 'capString'
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Sub removeCapDupesInRange( _
        rng As Range, _
        Optional ByVal Delimiter As String = " ", _
        Optional ByVal capFirst As Boolean = True)
    If Not rng Is Nothing Then
        Dim aRng As Range
        Dim cel As Range
        Dim Curr As String
        For Each aRng In rng.Areas
            For Each cel In aRng.Cells
                If Not IsError(cel) And VarType(cel) = vbString Then
                    cel.Value = CapsNoDupes(cel.Value, Delimiter, capFirst)
                End If
            Next cel
        Next aRng
    End If
End Sub


Sub TESTCapsNoDupes()
    
    Const s As String = "aucklandAucklandWhatEverAucklandarea"
    
    Debug.Print CapsNoDupes(s, False, ",")
    ' Result: 'auckland,Auckland,What,Ever,area'
    
    Debug.Print CapsNoDupes(s, True, ",")
    ' Result: 'Auckland,What,Ever,area'

End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose:      Returns the unique sub strings, denoted by capital letters
'               in a specified string, in a new delimited string removing
'               all repeating capitalized sub strings.
' Remarks:      The capFirst argument determines if the first character
'               of the initial string is to be capitalized. Default is 'True'.
' FLow:         Checks each element in the resulting "UniqueCapsToArray"'s
'               array against each other and determines which has more
'               characters. Then it tries to replace the string with less
'               characters in the string with more characters.
'               This is only done if the first character of the two comparing
'               strings is upper case. Finally it concatenates ('joins')
'               the elements of the array into a delimited string (the result).
' Precedents:   'UniqueCapsToArray', 'capString'
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function CapsNoDupes( _
    ByVal s As String, _
    Optional ByVal Delimiter As String = " ", _
    Optional ByVal capFirst As Boolean = True) _
As String
    Dim Caps As Variant: Caps = UniqueCapsToArray(s, capFirst)
    If Not IsEmpty(Caps) Then
        Dim i As Long
        Dim j As Long
        For i = 0 To UBound(Caps) - 1
            For j = 1 To UBound(Caps)
                If Left(Caps(i), 1) Like "[A-Z]" _
                        And Left(Caps(j), 1) Like "[A-Z]" Then
                    If Len(Caps(i)) = Len(Caps(j)) Then
                    ElseIf Len(Caps(i)) > Len(Caps(j)) Then
                        Caps(i) = Replace(Caps(i), Caps(j), _
                            "", , , vbBinaryCompare)
                    Else
                        Caps(j) = Replace(Caps(j), Caps(i), _
                            "", , , vbBinaryCompare)
                    End If
                End If
            Next j
        Next i
        CapsNoDupes = Join(Caps, Delimiter)
    End If
End Function


Sub testUniqueCapsToArray()
    
    Const s As String = "aucklandAucklandWhatEverAucklandarea"
    Dim arr As Variant
    
    arr = UniqueCapsToArray(s, False)
    Debug.Print Join(arr, ",")
    ' Result: 'auckland,Auckland,What,Ever,Aucklandarea'
    
    arr = UniqueCapsToArray(s, True)
    Debug.Print Join(arr, ",")
    ' Result: 'Auckland,What,Ever,Aucklandarea'

End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose:      Returns the unique sub strings, denoted by capital letters
'               in a specified string, in a 1D (zero-based) array.
' Remarks:      The capFirst argument determines if the first character
'               of the initial string is to be capitalized. Default is 'True'.
' Flow:         In the main part of the code ('Case Else'), loops through
'               the characters of the specified string to find an upper case
'               character. If found, writes the string ('word') containing
'               the previous characters (that had not already been written)
'               to the dictionary, which automatically discards any duplicates.
'               It continues looping and finally writes the keys
'               of the dictionary to the resulting 1D array.
' Precedents:   'capString'
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function UniqueCapsToArray( _
    ByVal CapDenotedString As String, _
    Optional ByVal capFirst As Boolean = True) _
As Variant
    Select Case Len(CapDenotedString)
        Case 0
            UniqueCapsToArray = VBA.Array("")
        Case 1
            If capFirst Then
                UniqueCapsToArray = VBA.Array(UCase(CapDenotedString))
            Else
                UniqueCapsToArray = VBA.Array(CapDenotedString)
            End If
        Case Else
            Dim s As String
            If capFirst Then
                s = capString(CapDenotedString)
            Else
                s = CapDenotedString
            End If
            Dim cStart As Long: cStart = 1
            Dim i As Long
            With CreateObject("Scripting.Dictionary")
                .CompareMode = vbBinaryCompare
                For i = 2 To Len(s)
                    If Mid(s, i, 1) Like "[A-Z]" Then
                        .Item(Mid(s, cStart, i - cStart)) = Empty
                        cStart = i
                    End If
                Next i
                .Item(Right(s, Len(s) - cStart + 1)) = Empty
                UniqueCapsToArray = .Keys
            End With
    End Select
End Function


Sub TESTcapString()
    Debug.Print capString("aaAB") ' Returns "AaAB".
    Debug.Print capString("AaAB") ' No change.
    Debug.Print capString("2aB") ' No change.
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose:      Capitalizes a string i.e. replaces only the first character
'               with the same upper case character.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function capString( _
    ByVal s As String) _
As String
    capString = Replace(s, Left(s, 1), UCase(Left(s, 1)), , 1)
End Function

【讨论】:

  • 嗨,@VBasic2008,谢谢。有用!!!如果你能分享你的代码解释那就太好了。
  • 我已经修改了代码并添加了一些 cmets。请随时进一步询问,但请具体描述您遇到的问题。
  • 非常感谢!你的解释对我这个学习BV编码的人很有帮助,甚至我不是。
【解决方案2】:

如果我遇到这个问题,我会考虑使用其他工具:代码编辑器和正则表达式。

算法如下:

  1. 将所有单元格复制到代码编辑器。
  2. 在正则表达式和区分大小写模式下使用“查找和替换”:
    1. 要查找的文本:[A-Z][a-z]+([A-Z][a-z]+)(area)
    2. 要放置的文本:\1 \2
  3. 全部替换。
  4. 复制结果并将其粘贴回 Excel。

这是我准备展示它在我身边的样子的 GIF。

【讨论】:

  • 嗨,@Terbiy,谢谢,您的想法非常有用且易于理解。我已经向你学习了。好吧,如果数据集在电子表格中显示数千或数百万条记录,则可能会导致一些错误。但它很棒,我真的很感激。
  • 是的,@UD.Cole,我同意,我的想法应用的成功取决于数据集的大小。? 它非常适合一次性平均数据使用量。
【解决方案3】:

此函数将识别第二个大写字母之前的子字符串,它会在主数据集中将其加粗并将子字符串粘贴到下一列以供进一步检查。

试试看,希望有用..

Sub cleanData()

Dim position As Byte
Dim upper_case(1 To 26) As String
Dim counter As Byte
Dim db_row_start, db_row_ends, db_col As Integer
Dim use_row As Integer

'CONFIG
'don't forget to change this values before you try it
'--------------------
db_row_start = 3       'in what row does your data starts?
db_row_ends = 20     'in what row does your data ends?
db_col = 2                 'in what column is your data?
'--------------------

'fill the array upper_case with all upper case characters
For counter = 1 To 26
    upper_case(counter) = Chr(counter + 64)
Next counter

For use_row = db_row_start To db_row_ends
    For counter = 1 To 26
        
        position = _
        InStr(2, Cells(use_row, db_col), upper_case(counter), _
        vbBinaryCompare)
        
        'if a second appercase is found enter the function
        If position > 0 Then
            
            'move substring to the next column
            Cells(use_row, db_col + 1) = _
            Left(Cells(use_row, db_col), position - 1)
            
            'bold the substring copied to the next column
            Cells(use_row, db_col).Characters(1, position - 1). _
            Font.Bold = True
            
            counter = 26
        End If
        
  Next counter
Next use_row

End Sub

【讨论】:

  • 嗨,@Gassz,感谢您分享您的好主意。根据您的回答与您提出相同的问题。您能否让我知道如何在 Excel 中使用此功能。我知道我可以打开 VB 面板,插入一个新模块,然后保存。然后呢?
  • 嗨@UD.Cole,有两种方法。您可以按 Visual Basic 窗口顶部的播放按钮或创建一个按钮并将宏分配给它。此代码要求您的数据旁边有一个空列,以便它可以复制/粘贴。运行前记得在 CONFIG 中配置代码,必须指定数据开始/结束的行和列号。
【解决方案4】:

在 VBA 中使用 RegExp。

Sub setRegexPattern()
    Dim regEx As Object 'New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range

    
    Set regEx = CreateObject("VBScript.RegExp")
    
    Set Myrange = ActiveSheet.Range("A1", Range("a" & Rows.Count).End(xlUp))
    
    For Each C In Myrange
        
        strPattern = "([A-Z][a-z]+[^A-Z])([A-z]+)(area)"
        
        If strPattern <> "" Then
            strInput = C.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1 $3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
    
End Sub

【讨论】:

  • 嗨,@Dy.Lee 感谢您分享您的好主意。您能否让我知道如何在 Excel 中使用此功能。我知道我可以打开 VB 面板,插入一个新模块,然后保存。然后怎样呢?我没有找到它的函数名称。是 setRegexPattern() 吗?
  • @UD.Cole,是的。根据您的情况设置 myRange 的范围。
猜你喜欢
  • 2014-11-11
  • 1970-01-01
  • 1970-01-01
  • 2015-03-11
  • 2018-10-06
  • 1970-01-01
  • 1970-01-01
  • 2014-10-13
  • 1970-01-01
相关资源
最近更新 更多