【发布时间】:2019-03-01 20:45:04
【问题描述】:
我创建了一个 vba 脚本来解析定义为 postTime 的不同帖子的时间以及来自网页的标题。虽然postTime 在目标页面中可用,但我想从登录页面获取它并使用从目标页面收集的postTitle 打印它。我在我的脚本中定义了能够收集所需内容的选择器。但是,我当前的尝试只打印了某个帖子的postTime 几次,而我想打印多个帖子的postTime。
如何在从另一个循环派生的循环中打印项目?
到目前为止我的脚本:
Sub CollectData()
Const baseUrl = "https://stackoverflow.com"
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim post As Object, itemlist$, linklist As Variant
Dim qualifiedLink$, nlink As Variant, postTime$, postTitle$
With Http
.Open "GET", "https://stackoverflow.com/questions/tagged/web-scraping", False
.send
Html.body.innerHTML = .responseText
End With
Set post = Html.querySelectorAll(".summary .question-hyperlink")
For I = 0 To post.Length - 1
postTime = Html.querySelector(".user-action-time").innerText
qualifiedLink = baseUrl & Split(post(I).getAttribute("href"), "about:")(1)
itemlist = itemlist & IIf(itemlist = "", "", " ") & qualifiedLink
Next I
linklist = Split(itemlist, " ")
For Each nlink In linklist
With Http
.Open "GET", nlink, False
.send
Html.body.innerHTML = .responseText
End With
postTitle = Html.querySelector("h1[itemprop='name'] a").innerText
' the following line prints postTime derived from earlier loop
Debug.Print postTime, postTitle
Next nlink
End Sub
【问题讨论】:
标签: excel vba web-scraping