【问题标题】:How to get a particular link from a specific class?如何从特定类中获取特定链接?
【发布时间】:2017-01-07 15:42:09
【问题描述】:

我想从那个特定的 class

中提取这个 href
<tr class="even">
    <td>
        <a href="/italy/serie-a-2015-2016/">Serie A 2015/2016</a>
    </td>

这是我写的:

Sub ExtractHrefClass()

    Dim ie As Object
    Dim doc As HTMLDocument
    Dim class As Object
    Dim href As Object

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate Range("D8")
    Do
        DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE
    Set doc = ie.document
    Set class = doc.getElementsByClassName("even")
    Set href = class.getElementsByTagName("a")
    Range("E8").Value = href
    ie.Quit

End Sub

但是很遗憾有一个错误Object doesn't support this property or method (Error 438)就行了:

    Set href = class.getElementsByTagName("a")

更新 1

我根据@RyszardJędraszyk 的回答修改了代码,但是没有输出 O_o 我哪里做错了?

Sub ExtractHrefClass()

    Dim ie As Object
    Dim doc As HTMLDocument
    Dim href As Object
    Dim htmlEle As Object

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate Range("D8")
    Do
        DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE And ie.Busy = False
    Set doc = ie.document
    Set href = doc.getElementsByTagName("a")
    For Each htmlEle In href
        If htmlEle.className = "even" Then
            Range("E8").Value = htmlEle
        End If
    Next
    ie.Quit

End Sub

更新 2

正如@dee在评论中要求的,有来自网页http://www.soccer24.com/italy/serie-a/archive/的代码

<tbody>
    <tr>
        <td>
            <a href="/italy/serie-a/">Serie A 2016/2017</a>
        </td>
        <td></td>
    </tr>
    <tr class="even">
        <td>
            <a href="/italy/serie-a-2015-2016/">Serie A 2015/2016</a>
        </td>
        <td>
            <span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span><a href="/team/juventus/C06aJvIB/">Juventus</a>
        </td>
    </tr>
    <tr>
        <td>
            <a href="/italy/serie-a-2014-2015/">Serie A 2014/2015</a>
        </td>
        <td>
            <span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span><a href="/team/juventus/C06aJvIB/">Juventus</a>
        </td>
    </tr>

我只需要提取那一行:/italy/serie-a-2015-2016/

【问题讨论】:

  • 请不要发布更新作为答案,只需编辑问题。

标签: vba excel internet-explorer web-scraping


【解决方案1】:

这对我有用:

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "http://www.soccer24.com/italy/serie-a/archive/", False
    .Send
    MsgBox Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
End With

您需要的过程可能如下所示:

Sub ExtractHrefClass()

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Range("D8").Value, False
        .Send
        Range("E8").Value = Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
    End With

End Sub

【讨论】:

  • @ALIENATO 如果解决了问题,请点击接受答案。
【解决方案2】:

试试:

Dim href As HTMLObjectElement

确保在引用中检查了正确的库(Microsoft HTML 对象库)。

您确定doc.getElementsByClassName("even") 有效吗?此处未列出:https://msdn.microsoft.com/en-us/library/aa926433.aspx 作为可用方法。

我总是先用getElementsByTagName,再做一个条件If htmlEle.className = "even" then

还添加以下内容:ie.readyState = READYSTATE_COMPLETE and ie.busy = False。尽管如此,如果它是一些基于 AJAX 的网站,则可能不足以确定该网站已完全加载(从链接猜测它可能是 flashscore.com,您需要跟踪网站上的元素以告知其加载状态)。

【讨论】:

  • Dim href As HTMLObjectElement 不起作用,我插入了 Microsoft HTML 对象库。现在我将尝试修改代码,请稍等。
【解决方案3】:

querySelectorAllquerySelector 可以在此处使用特定的class 选择tr 内部的anchor 元素,然后使用getAttribute("href") 可以检索href-attribute。 HTH。

' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library

Dim ie As Object
Dim name As String
Dim Doc As HTMLDocument

Set ie = New InternetExplorer
ie.Visible = 1

ie.navigate "<URL>"
While ie.Busy Or ie.readyState <> 4
    DoEvents
Wend
Set Doc = ie.document

Dim anchors As IHTMLDOMChildrenCollection
Dim anchor As IHTMLAnchorElement
Dim i As Integer

Set anchors = Doc.querySelectorAll("tr[class~='even'] a")

If Not anchors Is Nothing Then
    For i = 0 To anchors.Length - 1
        Set anchor = anchors.item(i)
        If anchor.getAttribute("href") = "/italy/serie-a-2015-2016/" Then
            Range("E8").Value = anchor.innerHTML
        End If
    Next
End If
ie.Quit

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-01-09
    • 1970-01-01
    • 1970-01-01
    • 2017-09-04
    • 1970-01-01
    相关资源
    最近更新 更多