【发布时间】:2022-01-15 18:06:45
【问题描述】:
我编写了一个宏来从website 的三页中抓取一些字段。我使用Array()来存储和写入结果,以使执行更快一点。
只要关注单个页面的内容,脚本就可以正常运行。但是,当我使用列表中的三个链接时出现问题。 To be specific, the script overwrites previous results。例如,我应该在执行后得到 150 个结果。相反,我从最后一个链接得到 50 个结果。
到目前为止我已经写了:
Public Sub FetchData()
Dim Xhr As Object, Html As HTMLDocument, Ws As Worksheet
Dim Link As Variant, Links As Variant, LeadInfo() As String
Dim I&, HtmlDoc As HTMLDocument, Listings As Object, Headers()
Dim URLS(), N As Variant
Links = Array( _
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=1&pagesize=50", _
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50", _
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50" _
)
Set Ws = ThisWorkbook.Worksheets("Sheet1")
Set Xhr = CreateObject("MSXML2.XMLHTTP")
Set Html = New HTMLDocument
Set HtmlDoc = New HTMLDocument
Headers = Array("Title", "URL", "User", "Asked")
For Each Link In Links
With Xhr
.Open "GET", Link, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
Html.body.innerHTML = .responseText
End With
Set Listings = Html.querySelectorAll(".summary")
ReDim LeadInfo(1 To Listings.Length, 1 To 4)
On Error Resume Next
For I = 0 To Listings.Length - 1
HtmlDoc.body.innerHTML = Listings.item(I).innerHTML
LeadInfo(I + 1, 1) = HtmlDoc.querySelector(".question-hyperlink").innerText
LeadInfo(I + 1, 2) = HtmlDoc.querySelector(".question-hyperlink").getAttribute("href")
LeadInfo(I + 1, 3) = HtmlDoc.querySelector(".user-details > a").innerText
LeadInfo(I + 1, 4) = HtmlDoc.querySelector(".user-action-time > span.relativetime").innerText
Next I
On Error GoTo 0
If IsEmpty(Ws.Cells(1, 1).Value) Then Ws.Cells(1, 1).Resize(1, UBound(Headers) + 1) = Headers
Ws.Cells(2, 1).Resize(UBound(LeadInfo, 1), UBound(LeadInfo, 2)) = LeadInfo
Next Link
End Sub
我怎样才能写出三个链接的所有结果,而不是只写最后一个链接的结果?
【问题讨论】:
-
使用
Ws.Cells(2, 1)中的变量,即Ws.Cells(k+2, 1)。在UBound(LeadInfo)之后增加k。 -
请查看编辑@CDP1802。
-
我的评论看不到任何变化
-
你说的很对。你的建议确实解决了问题。感谢一万亿。
标签: arrays excel vba web-scraping