【问题标题】:How to extract data from HTML divs into Excel如何将 HTML div 中的数据提取到 Excel 中
【发布时间】:2020-01-21 03:57:56
【问题描述】:

我正在尝试提取此网页中的详细信息,它们似乎位于某些具有“左选择”和“右选择”权利的“div”下。我还没有找到成功拉它的方法。 这是网址 - https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/

这是我要提取的图像。我想复制比赛名称以及每个参与者和得分。

我已尝试在此链接中使用 QHar 的方法 - How to extract values from nested divs using VBA。但是我在这条线上遇到了错误- ReDim 结果(1 To countries.Length / 2, 1 To 4)

这是我一直在尝试的代码

Option Explicit

Public Sub GetData()
Dim html As HTMLDocument, ws As Worksheet, countries As Object, scores As Object, results(), i As 
Long, r As Long

Set ws = ThisWorkbook.Worksheets("Sheet1"): Set html = New HTMLDocument: r = 1

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/", False
    .send
    html.body.innerHTML = .responseText
End With

Set participant = html.querySelectorAll(".market-content .selection-left"): Set scores = html.querySelectorAll("..market-content .selection-right")
ReDim results(1 To countries.Length / 2, 1 To 4)

For i = 0 To participant.Length - 1 Step 2
    results(r, 1) = participant.item(i).innerText: results(r, 2) = "'" & scores.item(i).innerText

    r = r + 1
Next
ws.Cells(1, 1).Resize(1, 4) = Array("Competition", "Participant", "Score")
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

我需要帮助才能使这段代码正常工作

【问题讨论】:

  • 如果countries.Length 不是偶数怎么办?还是零? “我遇到错误”并不是对您在运行代码时看到的内容的非常有用的描述 - 确切的错误消息是什么?
  • 您似乎声明了变量“国家”,但您没有设置它的值,所以当您尝试使用它的长度属性时它等于无。
  • 您最好的选择是 selenium basic,因为内容是动态呈现的。您可以尝试模仿页面发出的 POST 请求,但这并不简单。

标签: html excel vba internet-explorer web-scraping


【解决方案1】:

内容是动态添加的,因此不会出现在您当前的请求格式中;因此你的错误是因为你有一个长度为 0 的节点列表。你可以尝试像页面一样发出 POST 请求,但它看起来不像是快速简单的编码。如果这是一个小项目,我会使用浏览器自动化,以便 js 可以在页面上运行,您可以单击显示更多按钮。您需要等待条件才能正确加载页面。我使用显示更多按钮的存在。

Option Explicit

Public Sub GetOddsIE()
    Dim d As InternetExplorer, odds As Object, names As Object, i As Long
    Dim ws As Worksheet, results(), competition As String

    Set d = New InternetExplorer
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Const URL = "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/"

    With d
        .Visible = False
        .Navigate2 URL
        While .Busy Or .ReadyState <> 4: DoEvents: Wend
        With .Document.getElementsByClassName("expandable-below-container-button")
            Do
                DoEvents
            Loop While .Length = 0  'wait for element to be present
            .Item(0).Click 'click on show more
        End With

        Set names = .Document.getElementsByClassName("selection-left-selection-name")
        Set odds = .Document.getElementsByClassName("odds-convert")
        competition = .Document.getElementsByClassName("league")(0).innerText

        ReDim results(1 To names.Length, 1 To 3)

        For i = 0 To names.Length - 1
            results(i + 1, 1) = competition
            results(i + 1, 2) = names.Item(i).innerText
            results(i + 1, 3) = "'" & odds.Item(i).innerText
        Next
        .Quit
    End With
    ws.Cells(1, 1).Resize(1, 3) = Array("Competition", "Participant", "Score")
    ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

【讨论】:

  • 一如既往的好结果。如果我不希望浏览器显示怎么办?
  • d.visible = False
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-06-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-04-13
相关资源
最近更新 更多