【问题标题】:Excel VBA Pulling Web Data From a List of HyperlinksExcel VBA 从超链接列表中提取 Web 数据
【发布时间】:2018-07-29 04:12:48
【问题描述】:

我在工作表 1 的 C 列中有一个超链接列表,我想从每个链接中提取数据,并将每个链接的数据放在已经创建的单独工作表中。所有的超链接都指向同一个网站……职业足球参考……但每个链接都针对不同的 NFL 球员。我想为每个玩家提取相同的数据表。我已经能够从第一个链接中提取数据并将其按原样放入表 2 中,但我对 VBA 非常陌生,无法弄清楚如何为列表中的每个链接创建一个循环来执行此操作和把它放在其他纸上。以下是我目前必须从第一个链接获取数据的代码:

Sub passingStats()
Dim x As Long, y As Long
Dim htm As Object

Set htm = CreateObject("htmlFile")

With CreateObject("msxml2.xmlhttp")
    .Open "GET", Range("C2"), False
    .send
    htm.body.innerhtml = .responsetext
End With

With htm.getelementbyid("passing")
    For x = 0 To .Rows.Length - 1
        For y = 0 To .Rows(x).Cells.Length - 1
            Sheets(2).Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innertext
        Next y
        Next x
End With

End Sub

任何帮助将不胜感激。

【问题讨论】:

  • 能否请您提供实际的 URL,以便我们自己运行代码并获得实际结果?

标签: excel vba web-scraping excel-web-query


【解决方案1】:

以下显示使用循环。

注意

  1. 您的表写入中有一个逻辑缺陷,我已经为此编写了一个补丁。
  2. 某些字符串在您的脚本中被错误地转换。我用' 前缀来阻止这种情况。

代码:

Option Explicit
Public Sub GetInfo()
    Dim html As New HTMLDocument, links(), link As Long, wsSourceSheet As Worksheet
    Dim hTable As HTMLTable, ws As Worksheet, playerName As String
    Set wsSourceSheet = ThisWorkbook.Worksheets("Sheet1") '<change to sheet containing links
    Application.ScreenUpdating = False
    With wsSourceSheet
        links = .Range("C2:C" & .Cells(.Rows.Count, "C").End(xlUp).Row).Value
    End With
    For link = LBound(links, 1) To UBound(links, 1)
        If InStr(links(link, 1), "https://") > 0 Then
            Set html = GetHTMLDoc(links(link, 1))
            Set hTable = html.getElementById("passing")
            If Not hTable Is Nothing Then
                playerName = GetNameAbbr(links(link, 1))
                Set ws = AddPlayerSheet(playerName)
                WriteTableToSheet hTable, ws
                FixTable ws
            End If
        End If
    Next
    Application.ScreenUpdating = True
End Sub

Public Function GetHTMLDoc(ByVal url As String) As HTMLDocument
    Dim sResponse As String, html As New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", url, False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
    html.body.innerHTML = sResponse
    Set GetHTMLDoc = html
End Function

Public Sub WriteTableToSheet(ByVal hTable As HTMLTable, ByVal ws As Worksheet)
    Dim x As Long, y As Long
    With hTable
        For x = 0 To .Rows.Length - 1
            For y = 0 To .Rows(x).Cells.Length - 1
                If y = 6 Or y = 7 Then
                    ws.Cells(x + 4, y + 1).Value = Chr$(39) & .Rows(x).Cells(y).innerText
                Else
                    ws.Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innerText
                End If
            Next y
        Next x
    End With
End Sub

Public Function GetNameAbbr(ByVal url As String) As String
    Dim tempArr() As String
    tempArr = Split(url, "/")
    GetNameAbbr = Left$(tempArr(UBound(tempArr)), 6)
End Function

Public Function AddPlayerSheet(ByVal playerName As String) As Worksheet
    Dim ws As Worksheet
    If SheetExists(playerName) Then
        Application.DisplayAlerts = False
        ThisWorkbook.Worksheets(playerName).Delete
        Application.DisplayAlerts = True
    End If
    Set ws = ThisWorkbook.Worksheets.Add
    ws.Name = playerName
    Set AddPlayerSheet = ws
End Function

Public Function SheetExists(ByVal playerName As String) As Boolean
    SheetExists = Evaluate("ISREF('" & playerName & "'!A1)")
End Function

Public Sub FixTable(ByVal ws As Worksheet)
    Dim found As Range, numSummaryRows As Long
    With ws
        Set found = .Columns("A").Find("Career")
        If found Is Nothing Then Exit Sub
        numSummaryRows = .Cells(.Rows.Count, "A").End(xlUp).Row - found.Row
        numSummaryRows = IIf(numSummaryRows = 0, 1, numSummaryRows + 1)
        Debug.Print found.Offset(, 1).Resize(numSummaryRows, 30).Address, ws.Name
        found.Offset(, 1).Resize(numSummaryRows, 30).Copy found.Offset(, 2)
        found.Offset(, 1).Resize(numSummaryRows, 1).ClearContents
    End With
End Sub

在 sheet1 中测试链接:


示例网页:


对应的代码写出来:

【讨论】:

  • 非常感谢。我遇到的一个问题是运行时错误 91 对象或未在 WriteTableToSheet 函数上定义的变量。在我到达列表中的第 18 个玩家后,我在 For x = 0 行上收到错误。
  • 我刚刚发现了这个问题...虽然我确实遇到了与 Lamar Jackson 相同的问题...pro-football-reference.com/players/J/JackLa00.htm...but 我认为这是因为他是一个菜鸟,并且在下面没有桌子id=通过标签。有没有办法忽略没有桌子的玩家并转到我表格上的下一个链接?另外,我比你知道的更感谢你的帮助,我对 VBA 的经验很少,这让我发疯了
  • 第一个菜鸟完美运行,第二个菜鸟出现同样的错误。任何的想法?不确定是否有帮助,但此链接是:pro-football-reference.com/players/M/MayfBa00.htm
  • 更新后的答案完美运行!非常感谢!我会投票给你的答案,但我没有 15 名声望哈哈。再次感谢。
  • 效果很好!非常感谢您的帮助。
猜你喜欢
  • 2016-12-11
  • 2016-08-26
  • 2022-11-22
  • 2021-02-01
  • 1970-01-01
  • 1970-01-01
  • 2015-11-20
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多