【问题标题】:Web Scraping IE NAVIGATE method Works Vs MSXML2.XMLHTTP60 not WorkingWeb Scraping IE NAVIGATE 方法 Works Vs MSXML2.XMLHTTP60 不工作
【发布时间】:2020-08-24 07:23:43
【问题描述】:

我正在从 NSE 站点提取数据, 网址是:https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#

我使用 Internet Explorer 成功提取项目,但是这种方法很慢, 所以我转移到 MSXML2.XMLHTTP60 方法,但是这个方法返回空字符串

请找到我的代码

Method 1:Works fine
Sub OI_Slow_Method()
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")

Dim Link As String
Link = ActiveSheet.Range("C4").Value

ie.Visible = False
ie.navigate Link
Do

DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE

Dim doc As HTMLDocument
Set doc = ie.document

Dim objElement As HTMLObjectElement
Dim sDD As String

doc.Focus

ActiveSheet.Cells(1, 1).Value = doc.getElementById("openInterest").innerText 'Open Interest Value


ie.Quit
ie.Visible = True
Set doc = Nothing
Set ie = Nothing
End Sub
'--------------------------

Method 2:Help required in this method only
Sub OI_Fast_Method()
    Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument

    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument

    With xhr
        .Open "GET", "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=30APR2020#", False
        .send
         html.body.innerHTML = StrConv(.responseBody, vbUnicode)
    End With

  Debug.Print html.getElementById("openInterest").Innertext 
  'The output of this is "<SPAN id=openInterest>??</SPAN>" only question mark returned inside the SPAN
End Sub

【问题讨论】:

  • 当您使用 IE(或任何浏览器)导航到页面时,该页面可能包含进一步向页面添加内容的脚本(通过从页面中嵌入的数据构建元素,或通过请求附加来自服务器的数据)。当你使用 XmlHttp 时不会发生这种情况——你得到的只是服务器提供的原始页面源:没有别的——没有图像、脚本等。

标签: html excel vba web-scraping


【解决方案1】:

我认为蒂姆一如既往地一针见血。你得到了一些原始的 XML,而你想要的东西不在那个 XML 中。您可以进行数据转储并得到您想要的。

Sub DumpData()

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True

URL = "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#"

'Wait for site to fully load
ie.Navigate2 URL
Do While ie.Busy = True
   DoEvents
Loop

RowCount = 1

With Sheets("Sheet1")
   .Cells.ClearContents
   RowCount = 1
   For Each itm In ie.Document.all
      .Range("B" & RowCount) = Left(itm.innerText, 1024)
   RowCount = RowCount + 1
   Next itm
End With
End Sub

然后你将不得不解析文本。这并不难,但会有点额外的劳动。

另一种选择可能是下载网站的全部内容,将其保存为文本文件,导入数据,然后解析该数据。

Sub Sample()
    Dim ie As Object
    Dim retStr As String

    Set ie = CreateObject("internetexplorer.application")

    With ie
        .Navigate "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#"
        .Visible = True
    End With

    Do While ie.readystate <> 4: Wait 5: Loop

    DoEvents

    retStr = ie.document.body.innerText

    '~> Write the above to a text file
    Dim filesize As Integer
    Dim FlName As String

    '~~> Change this to the relevant path
    FlName = "C:\Users\ryans\OneDrive\Desktop\Sample.Txt"

    filesize = FreeFile()

    Open FlName For Output As #filesize

    Print #filesize, retStr
    Close #filesize
End Sub

Private Sub Wait(ByVal nSec As Long)
    nSec = nSec + Timer
    While nSec > Timer
        DoEvents
    Wend
End Sub

我无法让您的任何一个代码示例在我的机器上运行。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-12-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-11-20
    • 2021-08-22
    相关资源
    最近更新 更多