【发布时间】:2015-02-24 16:42:12
【问题描述】:
我正在解析一些 html 以将其转换为 openXML xlsx。我无法提取样式属性。我可以使用自定义解析器强制执行此操作,但是,我希望尽可能多地使用 mshtml。源 html 可能有一些非标准格式。详情如下:
(以下:输入、代码和调试输出)
输入字符串:
<div id="GLGV" class="GLVG1">
<div class="GLGVOuterRow" ID="GLGV_PRTS_0" style="height:20px;">
<span id="ExtID01_0000" title="Note - N0001" class="ExtID01Label">N0001</span>
<span id="Note01" class="Note01" style="display:inline-block;width:70px;">Area Name</span>
<span id="Main01" class="MainTextAll" style="display:inline-block;height:16px;width:250px;">My new area</span>
<span id="OTLID_0" class="GRPL_Hidden">8270</span>
<span id="OTLParID_0" class="GRPL_Hidden">8269</span>
<span id="PrtTyp_0" class="GRPL_Hidden">NOTE</span>
<span class="FloatClear"></span>
</div>
Asp.net 代码:
Public Sub TestSample()
Dim wrkListString As String = C.AC("List")
Dim wrkDocument As IHTMLDocument2 = New HTMLDocumentClass()
wrkDocument.write(wrkListString)
wrkDocument.close()
Dim wrkAllElements As IHTMLElementCollection = wrkDocument.body.all
Dim ws As String = ""
Dim wrkType As String = ""
Dim wrkStyle As String = ""
Dim wrkId As String = ""
Dim wrkClass As String = ""
For Each wrkElem In wrkAllElements
wrkType = wrkElem.GetType().ToString
wrkId = wrkElem.id
wrkClass = wrkElem.className
wrkStyle = wrkElem.Style.ToString
ws = wrkType & " , " & wrkId & " , " & wrkClass & " , " & wrkStyle & " , "
Debug.Print(ws)
Next
End Sub
调试输出:
mshtml.HTMLDivElementClass , GLGV , GLVG1 , System.__ComObject ,
mshtml.HTMLDivElementClass , GLGV_PRTS_0 , GLGVOuterRow , System.__ComObject ,
mshtml.HTMLSpanElementClass , ExtID01_0000 , ExtID01Label , System.__ComObject ,
mshtml.HTMLSpanElementClass , Note01 , Note01 , System.__ComObject ,
mshtml.HTMLSpanElementClass , Main01 , MainTextAll , System.__ComObject ,
mshtml.HTMLSpanElementClass , OTLID_0 , GRPL_Hidden , System.__ComObject ,
mshtml.HTMLSpanElementClass , OTLParID_0 , GRPL_Hidden , System.__ComObject ,
mshtml.HTMLSpanElementClass , PrtTyp_0 , GRPL_Hidden , System.__ComObject ,
mshtml.HTMLSpanElementClass , , FloatClear , System.__ComObject ,
从 span id="Main01" 看不到详细样式,只有“System.__ComObject”
任何有关如何获取详细的内联样式字符串的帮助将不胜感激。谢谢!
【问题讨论】:
-
下面的答案对你有用吗?
-
是的,工作顺利,非常棒,可以看到细节。对延误表示歉意;其他线程调用。我需要那个,非常有帮助,谢谢。
标签: html asp.net .net vb.net mshtml