【发布时间】:2019-01-13 13:32:01
【问题描述】:
对于显示检索数据表的动态网页,我发现 MSXML2.XMLHTTP 和 Internet Explorer 对象通常都无法访问这些数据。一个很好的例子是https://www.tiff.net/tiff/films.html。这两种技术都不会检索任何电影数据——只是周围的网页。我试过的代码如下:
Function getHTTP(ByVal sReq As String) As Variant
On Error GoTo onErr
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sReq, False
.send
getHTTP = StrConv(.responseBody, 64)
End With
Exit Function
onErr: MsgBox "Error " & Err & ": " & Err.Description, 49, "Error opening site"
End Function
Function GetHTML(ByVal strURL As String) As Variant
Dim oIE As InternetExplorer
Dim hElm As IHTMLElement
Set oIE = New InternetExplorer
oIE.Navigate strURL
Do While (oIE.Busy Or oIE.ReadyState <> READYSTATE_COMPLETE)
DoEvents
Loop
Set hElm = oIE.Document.all.tags("html").Item(0)
GetHTML = hElm.outerHTML
Set oIE = Nothing
Set hElm = Nothing
End Function
但有一种方法可以手动检索电影数据 - 只需使用 Microsoft Edge 或 Internet Explorer 按照以下步骤操作:
Right-click on one of the movies
Choose “inspect element." The DevTools console opens.
At the bottom-left of the screen, click on the “html” tab.
Right-click the tab. Choose “copy.”
Open notepad and paste what you’ve copied.
您现在有了电影数据,可以将其保存到文件中进行解析。我的问题:有没有办法以编程方式获取这些数据?
【问题讨论】:
-
有趣的问题+1
标签: html vba ms-access web-scraping