通过链接文本抓取网页答案

【问题标题】：Web scraping by link text通过链接文本抓取网页
【发布时间】：2019-05-15 08:46:26
【问题描述】：

我有一些经验和知识如何通过 tagName 或 ClassName 抓取。但是，在这种特殊情况下，className 不是唯一的，并且链接在访问页面后一直在变化，因此无法获得直接链接。唯一独特的组合是类和链接文本。例如Budget and Forecast updating 和a_1_610 和Budget and Forecast updating 和a_1_611 访问的代码是什么？

我的代码（根据 QHarr 回答编辑）：

Sub GoToLiinosBot()

'This will load a webpage in IE
Dim ie As InternetExplorer
Dim HWNDSrc As Long
Dim elements As Object

Set ie = Nothing
Set ie = New InternetExplorerMedium

ie.Visible = True

ie.Navigate "http://link.com"

With ie

Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE
End With

    Application.Wait (Now + TimeValue("0:00:04"))
    
    ie.Document.querySelector(".data .a_1_611").innerText

'Unload IE
Set ie = Nothing
End Sub

这里是源代码：

【问题讨论】：

有多种方法可以实现这一点，您可以获取包含所需类的节点集合，然后遍历那些评估 inner.text 或 inner.html 以查看对于您想要的节点（使用 .getElementsByClassName() see this answer） - 另一种可以说是最有效的方法是使用 .querySelector() 直接定位节点，如 this question / w3schools link
Another resource for the selectors（未编辑，因为达到字符限制）特别是：[attribute*=value] a[href*="w3schools"] Selects every <a> element whose href attribute value contains the substring "w3schools"

标签： html excel vba web-scraping

【解决方案1】：

它们是类名而不是 id。可能需要一个循环，测试节点的 innerText 值，如果排序发生变化，但否则您想要图像中显示的示例的第一个匹配项

.data .a_1_611

这是

ie.document.querySelector(".data .a_1_611").click

nth-of-type 用于固定位置选择，但比类选择器更昂贵。

【讨论】：

这会给出错误 5002 Application-defined or object-defined error。另请参阅源代码其余部分的已编辑问题，因为它是连续的。 .data 内有 5 个类 .a_1_611 的变体
但是第一个是必需的吗？将 .data .a_1_611 放入浏览器以在搜索框中进行检查。错误似乎不对。它发生在那条线上吗？
我已将.data .a_1_611 添加到我的问题搜索结果中。是的，应该选择第一个，但知道如何选择第二个会很有趣。错误指向ie.document.querySelector(".data .a_1_611").innerText。我已经编辑了有问题的代码是正确的还是应该有.Click，因为它是链接？
如果你想点击然后ie.document.querySelector(".data .a_1_611").点击
第二种：.data .a_1_611 .data .a_1_611 是一种方式