【发布时间】:2020-02-10 12:53:49
【问题描述】:
我有一个问题,不同的网站需要不同类型的 QuerySelector(即GetElementsByClassName、GetElementsByTagName、querySelectorAll 等)在循环中才能返回结果。
就目前而言,如果我在“Set list = html.querySelectorAll(ID)”行使用硬编码的 FIX SELECTOR,则此代码可用于检索网站信息,但如果我尝试使该代码不起作用它基于循环中的工作表行属性 query.name VARIABLE。
我不确定这是否只是分配了正确的变量类型,但我只是不知道如何处理使这部分工作......
Sub FETCHER()
Dim URL As String, tag As String, ID As String, LastRow As Integer, j As Long
Dim html As HTMLDocument, list As Object, i As Long
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.XMLHTTP.6.0")
'CreateObject("WINHTTP.WinHTTPRequest.5.1")
Worksheets("TEST").Activate
LastRow = Range("C" & Rows.Count).End(xlUp).Row
For j = 2 To LastRow
With ActiveSheet
URL = Range("S" & j).Text 'URL = https://stackoverflow.com/
tag = Range("T" & j).Text 'TAG = getElementsByClassName '<--- Where I want to assign the selector type (i.e. getelementsbyclassname, getelementsbytagname, etc.)
ID = Range("U" & j).Text 'Element ID = "CONTENT"
End With
Set xmlhttp = New MSXML2.XMLHTTP60
Set html = New HTMLDocument
With xmlhttp
.Open "GET", URL, False
.setRequestHeader "User-Agent", "Chrome/39.0.2171.95"
.Send
html.body.innerHTML = .responseText
End With
'Set list = html.querySelectorAll("CONTENT") <---Example..
Set list = html.querySelectorAll(ID) '<---This WORKS as it's HARD-CODED
Set list = html.TAG(ID) '<---This DOESN'T WORK in trying to make it VARIABLE
For i = 0 To 5
With ActiveSheet
.Cells(j, 22 + i) = list.Item(i).innerText
'.Cells(j + 1, 22 + 1) = list.Item(i).getAttribute("href")
End With
Next
Next
End Sub
任何帮助将不胜感激!
【问题讨论】:
-
需要一些示例来说明您的 S、T、U 列单元格中的内容。你不能这样做
html.TAG(ID),但如果不知道 TAG 中的内容,就很难提出建议。 -
嘿,我已经更新了帖子,添加了一个示例,说明循环中一 (i) 行的外观。 TAG 代表“HTML DOM querySelector()”
标签: xml vba web-scraping queryselector