【发布时间】:2020-03-07 13:29:08
【问题描述】:
此话题与Loop through links and download PDF's相关
我正在尝试将我当前的 VBA 代码转换为 VBScript。我已经明白我有 删除变量类型(作为 ... Dim 语句的一部分)并使用 CreatObject 来获取这些对象,但否则一切都应该按原样移植。 DoEvents 也必须替换为 Wscript.sleep 之类的内容。
我遇到了一些问题。目前在运行 VBS 文件时,我收到一条错误消息,提示“需要对象:'MSHTML'”。指向第 65 行,我有Set hDoc = MSHTML.HTMLDocument。我试图在谷歌上搜索,但对这个没有任何帮助。
我应该如何处理这个?
DownloadFiles("https://www.nordicwater.com/products/waste-water/")
Sub DownloadFiles(p_sURL)
Set xHttp = CreateObject("Microsoft.XMLHTTP")
Dim xHttp
Dim hDoc
Dim Anchors
Dim Anchor
Dim sPath
Dim wholeURL
Dim internet
Dim internetdata
Dim internetlink
Dim internetinnerlink
Dim arrLinks
Dim sLink
Dim iLinkCount
Dim iCounter
Dim sLinks
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = False
internet.navigate (p_sURL)
Do Until internet.ReadyState = 4
Wscript.Sleep 100
Loop
Set internetdata = internet.document
Set internetlink = internetdata.getElementsByTagName("a")
i = 1
For Each internetinnerlink In internetlink
If Left(internetinnerlink, 36) = "https://www.nordicwater.com/product/" Then
If sLinks <> "" Then sLinks = sLinks & vbCrLf
sLinks = sLinks & internetinnerlink.href
i = i + 1
Else
End If
Next
wholeURL = "https://www.nordicwater.com/"
sPath = "C:\temp\"
arrLinks = Split(sLinks, vbCrLf)
iLinkCount = UBound(arrLinks) + 1
For iCounter = 1 To iLinkCount
sLink = arrLinks(iCounter - 1)
'Get the directory listing
xHttp.Open "GET", sLink
xHttp.send
'Wait for the page to load
Do Until xHttp.ReadyState = 4
Wscript.Sleep 100
Loop
'Put the page in an HTML document
Set hDoc = MSHTML.HTMLDocument
hDoc.body.innerHTML = xHttp.responseText
'Loop through the hyperlinks on the directory listing
Set Anchors = hDoc.getElementsByTagName("a")
For Each Anchor In Anchors
'test the pathname to see if it matches your pattern
If Anchor.pathname Like "*.pdf" Then
xHttp.Open "GET", wholeURL & Anchor.pathname, False
xHttp.send
With CreateObject("Adodb.Stream")
.Type = 1
.Open
.write xHttp.responseBody
.SaveToFile sPath & getName(wholeURL & Anchor.pathname), 2 '//overwrite
End With
End If
Next
Next
End Sub
功能:
Function getName(pf)
getName = Split(pf, "/")(UBound(Split(pf, "/")))
End Function
【问题讨论】:
-
请不要更改原始发布的代码,因为它会更改问题的上下文和任何相关的答案。最初的问题是
Set hDoc = MSHTML.HTMLDocument更改将使提供的答案无效。已将问题回滚。
标签: web-scraping vbscript web-crawler