【发布时间】:2020-08-24 07:23:43
【问题描述】:
我正在从 NSE 站点提取数据, 网址是:https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#
我使用 Internet Explorer 成功提取项目,但是这种方法很慢, 所以我转移到 MSXML2.XMLHTTP60 方法,但是这个方法返回空字符串
请找到我的代码
Method 1:Works fine
Sub OI_Slow_Method()
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
Dim Link As String
Link = ActiveSheet.Range("C4").Value
ie.Visible = False
ie.navigate Link
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim objElement As HTMLObjectElement
Dim sDD As String
doc.Focus
ActiveSheet.Cells(1, 1).Value = doc.getElementById("openInterest").innerText 'Open Interest Value
ie.Quit
ie.Visible = True
Set doc = Nothing
Set ie = Nothing
End Sub
'--------------------------
Method 2:Help required in this method only
Sub OI_Fast_Method()
Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=30APR2020#", False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With
Debug.Print html.getElementById("openInterest").Innertext
'The output of this is "<SPAN id=openInterest>??</SPAN>" only question mark returned inside the SPAN
End Sub
【问题讨论】:
-
当您使用 IE(或任何浏览器)导航到页面时,该页面可能包含进一步向页面添加内容的脚本(通过从页面中嵌入的数据构建元素,或通过请求附加来自服务器的数据)。当你使用 XmlHttp 时不会发生这种情况——你得到的只是服务器提供的原始页面源:没有别的——没有图像、脚本等。
标签: html excel vba web-scraping