【发布时间】:2016-03-10 06:39:29
【问题描述】:
除了几年前在 VB 中完成的一些简短的小事情外,我还没有真正用 VBA 或类似的东西编写过代码。这是我尝试编写一些代码来搜索客户帐户的 excel 表数据库并搜索可能的重复帐户。可悲的是,在我需要运行它的机器上,它只能处理大约 3,500 个条目而不会导致 Excel 崩溃。我将这归因于我的代码非常未优化以及机器运行缓慢。
可以做些什么来优化以下代码,以及将来我应该使用哪些 VBA 最佳实践?
'Essentially, this loops through each row in the sheet
'For each row, it loops through every row after it, searching for duplicates of itself (skipping over a rows that have previously been marked as duplicates)
'Duplicates are defined by entries that meet a 'threshhold' of similarity
'The threshhold is defined as the number '5', first and last names are each two points, address and email address are one point
'That means that in order for an entry to meet the thresshold, the first and last name must be the same, and it must also have either the same address or email
'When duplicates are found, the duplicate column is marked as 'Yes' for that row, and the first occurence column is marked with a number defining the row number where the account first appeared
Sub Main():
Dim lNameCol, fNameCol, addressCol, emailCol, duplicateCol, fOccurenceCol As String
'Defines the column letters for the various data fields
lNameCol = "A"
fNameCol = "B"
addressCol = "C"
emailCol = "D"
duplicateCol = "E" 'The column where a entry/row will be marked as being a duplicate
fOccurenceCol = "F" 'The column that contains the row number where a duplicate accounts first occurence was found
Call Duplicates(lNameCol, fNameCol, addressCol, emailCol, duplicateCol, fOccurenceCol)
End Sub
'Gets number of rows in currently active sheet
Function RowCount():
Application.ActiveSheet.UsedRange
RowCount = Worksheets("Sheet1").UsedRange.Rows.Count
End Function
'Finds and labels duplicates
Sub Duplicates(ByVal lNameCol As String, ByVal fNameCol As String, ByVal addressCol As String, ByVal emailCol As String, ByVal duplicateCol As String, ByVal fOccurenceCol As String)
Dim lRowCount As Integer
lRowCount = RowCount()
'Loops through each row in the sheet
For i = 1 To lRowCount
Dim duplicate, lastName, firstName, email, address As String
'Sets these variables' values corresponding cell value in row 'i'
'UCase capitilizes things to make entries case-insensitive
duplicate = UCase(Range(duplicateCol & i).Value)
lastName = UCase(Range(lNameCol & i).Value)
firstName = UCase(Range(fNameCol & i).Value)
email = UCase(Range(emailCol & i).Value)
address = UCase(Range(addressCol & i).Value)
'Checks to make sure row has not already been marked a duplicate, if it hasn't it continues
If (StrComp(duplicate = "YES", vbTextCompare) = 1) Then
'Loops through every row after the current row (row 'i')
For n = (i + 1) To lRowCount
'duplicateThreshold is an integer that defines the threshhold of similarity that rows need to have in order to be labeled a duplicate
Dim duplicateThreshhold As Integer
Dim lastName2, firstName2, email2, address2 As String
duplicateThreshhold = 0
'These are the entry variables for account entry at row 'n' being compared to the account entry at row 'i'
lastName2 = UCase(Range(lNameCol & n).Value)
firstName2 = UCase(Range(fNameCol & n).Value)
email2 = UCase(Range(emailCol & n).Value)
address2 = UCase(Range(addressCol & n).Value)
'Adds 2 points to threshhold if first name is the same
If lastName = lastName2 Then
duplicateThreshhold = duplicateThreshhold + 2
End If
'Adds 2 points to threshold if last name is the same
If firstName = firstName2 Then
duplicateThreshhold = duplicateThreshhold + 2
End If
'The remaining two fields give 1 point each to the thresshold
'As long as the sum of the points given by first and last name is always greater than half of the threshhold, first and last name will always be required
If email = email2 Or address = address2 Then
duplicateThreshhold = duplicateThreshhold + 1
End If
If duplicateThreshhold > 4 Then
'Labels duplicate entries as duplicates
Range(duplicateCol & i).Value = "Yes"
Range(duplicateCol & n).Value = "Yes"
'Labels duplicate entries with the first occurence of that entry
Range(fOccurenceCol & i).Value = i 'Labels first occurence account's row number
Range(fOccurenceCol & n).Value = i
End If
Next
End If
Next
End Sub
【问题讨论】:
-
为什么不在 Excel 中使用删除重复项功能,而不使用 VBA?在 Excel 2013 中:突出显示您的列,导航到 Data->Remove Duplicates
-
我发现 this site 上的提示有助于让我的代码运行得更快,最好的第一步可能是看看那里。
-
好吧,听起来您正在尝试构建自己的“模糊逻辑”风格(MS 已经相当出色地完成了这一点)。看看Set Auto Filtering multiple wildcards、Can Advanced Filter criteria be in the VBA rather than a range? 和Can AutoFilter take both inclusive and non-inclusive wildcards from Dictionary keys?。
-
我不想删除重复项,我想标记可能的重复项(名字和姓氏相同,电子邮件或地址相同的东西)以供以后检查。我也在为那些技术文盲,只想能够点击“运行”并让它做它应该做的事情的人写这篇文章。此外,我将在未来实现一些无法从高级过滤中获得的附加功能,因此我将继续使用自定义宏。 @eirikdaude,我会看看,谢谢。
-
我会说您需要将数据读入数组,然后对数组上的重复项进行处理。您甚至可以将重复的“是”读入新列的数组中,然后输出整个内容。使用电子表格最慢的部分通常是访问工作表本身,因此使用上述方法您只能访问工作表两次,一次读取一次写入。