【发布时间】:2015-11-19 04:32:01
【问题描述】:
我写了一个函数,它在字符串中搜索给定的标签并删除所有这些标签及其内容,除了第一个:
Sub Main()
Dim fileAsString = "<div>myFirstDiv</div>" +
"<Div></dIV>" +
"<city>NY</city>" +
"<city></city>" +
"<div></div>" +
"<span></span>"
' Removes these tags and their content from fileAsString, except the
' first appearance
Dim forbiddenNodeslist As New List(Of String)
forbiddenNodeslist.Add("div")
forbiddenNodeslist.Add("city")
' Run all over the forbidden tags
For Each node In forbiddenNodeslist
Dim re = New Regex("<" + node + "[^>]*>(.*?)</" + node + ">", RegexOptions.IgnoreCase)
Dim matches = re.Matches(fileAsString)
Dim matchesCount = matches.Count - 1
' Count the characters that were replaced by empty string, in order
' to update the start index of the other matches
Dim removedCharacters = 0
' Run all over the matches, except the first one
For index = 1 To matches.Count - 1
Dim match = matches(index)
' set start index and length in order to replace it by empty string
Dim startIndex = match.Index - removedCharacters
Dim matchCharactersCount = match.Length
' Update the number of characters that will be removed
removedCharacters = matchCharactersCount
' Remove it from the string
fileAsString = fileAsString.Remove(startIndex, matchCharactersCount)
Next
Next
end sub
但它效率低下,因为我搜索匹配项(字符串的第一个循环),然后一次又一次地循环以便用空字符串替换它。
如何提高效率?
任何帮助表示赞赏!
【问题讨论】:
-
您是否有理由存储已删除的字符和已删除标签的位置?如果没有,这只是额外的开销。循环遍历您的违规标签列表以删除并使用单个语句删除/替换所有出现。 stackoverflow.com/questions/6025560/…
-
是的,我存储它,因为当我删除一些字符串时,下一个匹配的开始索引需要更新。例如:“ ”,第一个 div 出现在索引 0,第二个出现在 11,第三个出现在 22。当我删除第二个 div ,第三个 div 将位于索引 11 而不是 22。
-
您可以反转整个字符串,然后只删除除最后一次出现的所有字符串,然后再次反转以获得相同的结果。