您可以使用 Linq2Xml 轻松解析代码:
XElement doc = XElement.Parse(...)
然后使用最佳匹配算法对内存中的有效属性字典更正错误的属性。
编辑:我编写并测试了这个简化的最佳匹配算法(对不起,它是 VB):
Dim validTags() As String =
{
"width",
"height",
"img"
}
(简化,您应该创建一个更结构化的字典,其中包含标签和每个标签的可能属性)
Dim maxMatch As Integer = 0
Dim matchedTag As String = Nothing
For Each Tag As String In validTags
Dim match As Integer = checkMatch(Tag, source)
If match > maxMatch Then
maxMatch = match
matchedTag = Tag
End If
Next
Debug.WriteLine("matched tag {0} matched % {1}", matchedTag, maxMatch)
上面的代码调用了一个方法来确定源字符串等于任何有效标签的百分比。
Private Function checkMatch(ByVal tag As String, ByVal source As String) As Integer
If tag = source Then Return 100
Dim maxPercentage As Integer = 0
For index As Integer = 0 To tag.Length - 1
Dim tIndex As Integer = index
Dim sIndex As Integer = 0
Dim matchCounter As Integer = 0
While True
If tag(tIndex) = source(sIndex) Then
matchCounter += 1
End If
tIndex += 1
sIndex += 1
If tIndex + 1 > tag.Length OrElse sIndex + 1 > source.Length Then
Exit While
End If
End While
Dim percentage As Integer = CInt(matchCounter * 100 / Math.Max(tag.Length, source.Length))
If percentage > maxPercentage Then maxPercentage = percentage
Next
Return maxPercentage
End Function
上面的方法,给定一个源字符串和一个标签,找到比较单个字符的最佳匹配百分比。
给定“widt”作为输入,它找到“width”作为匹配值达到 80% 的最佳匹配。