【问题标题】:Unable to get the exact element class table when scraping data from web using VBA使用 VBA 从 Web 抓取数据时无法获取准确的元素类表
【发布时间】:2022-01-20 08:55:53
【问题描述】:

我想从网站上抓取下表。 enter image description here

根据网页代码,我发现表格似乎属于元素类etxtmed,所以我在VBA下面写了。运行此代码后,我发现它只抓取以下数据 enter image description here

我以为这是因为("etxtmed")(0) 指的是第一个("etxtmed") 表然后我在(0) 之后尝试了几个数字,VBA 首先报告"Element not exist" 然后在这行代码r = tbl.Rows.Length - 1 报告错误Run-time error '91':Object variable or With block variable not set。是不是因为我刮错了表的类别?

Sub CopyRateFromHKAB()

    Dim ie As Object, btnmore As Object, tbl As Object
    Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
    
    ThisWorkbook.Sheets("Sheet2").UsedRange.Clear
    
    Set ie = CreateObject("internetexplorer.application")
    With ie
        '.Visible = True
        .navigate "https://www.hkab.org.hk/DisplayInterestSettlementRatesAction.do?lang=en"
        
        Do
            DoEvents
        Loop While .readyState <> 4 Or .Busy
          
    
        Set tbl = .document.getElementsByClassName("etxtmed")(0)
        
        If tbl Is Nothing Then
            MsgBox "Element not exist"
        End If
            
    End With
    
    'get data from table
    r = tbl.Rows.Length - 1
    c = tbl.Rows(0).Cells.Length - 1
    
    ReDim arr(0 To r, 0 To c)
    
    Set rr = tbl.Rows
    For i = 0 To r
        Set cc = rr(i).Cells
        For j = 0 To c
            arr(i, j) = cc(j).innertext
        Next
    
    Next
    
    ie.Quit
  
    Application.ScreenUpdating = False
    ThisWorkbook.Sheets("Sheet2").Cells(1, 1).Resize(r + 1, c + 1) = arr
    
    With ThisWorkbook.Sheets("Sheet2")
        .UsedRange.WrapText = False
        .Columns.AutoFit
    End With
    
End Sub

【问题讨论】:

    标签: html excel vba web-scraping


    【解决方案1】:

    您想要的表格在 IFRAME 内,因此您需要直接访问该页面&lt;iframe src="/hibor/listRates.do?lang=en&amp;Submit=Detail"

    Option Explicit
    
    Sub CopyRateFromHKAB()
        
        Const URL = "https://www.hkab.org.hk/hibor/listRates.do?lang=en&amp;Submit=Detail"
        Dim HTMLDoc As Object, request As Object
        
        ' get web page
        Set HTMLDoc = CreateObject("HTMLfile")
        Set request = CreateObject("MSXML2.XMLHTTP")
        With request
            .Open "GET", URL, False
            .send
            HTMLDoc.body.innerHTML = .responseText
        End With
        
        ' parse html table
        Dim wb As Workbook, r As Long, c As Long, arr
        Dim tbl As Object, t As Object, tr As Object, td As Object
        
        Set wb = ThisWorkbook
        Set tbl = HTMLDoc.getElementsByClassName("etxtmed")
        
        If tbl Is Nothing Then
            MsgBox "No tables found", vbExclamation
            Exit Sub
        Else
            If tbl(2) Is Nothing Then
                MsgBox "Table not found", vbExclamation
                Exit Sub
            Else
                r = tbl(2).Rows.Length
                ReDim arr(1 To r, 1 To 3)
                r = 1
                For Each tr In tbl(2).Rows
                   c = 1
                   For Each td In tr.Cells
                       arr(r, c) = td.innerText
                       c = c + 1
                   Next
                   r = r + 1
                Next
            End If
                     
            'copy to sheet
            With wb.Sheets("Sheet2")
                .Cells(1, 1).Resize(UBound(arr), UBound(arr, 2)) = arr
                .UsedRange.WrapText = False
                .Columns.AutoFit
            End With
              
        End If
        MsgBox "Done", vbInformation
    End Sub
    

    【讨论】:

    • 感谢您的帮助!我昨晚测试了代码,它运行成功。该网站目前似乎不可用,稍后我将对其进行测试。由于我经常使用 IE 方法,而 XMLHTTP 方法对我来说还是很新的,根据我目前所学到的知识,您尝试获取名为 etxtmed 的所有元素类。 tbl(2) 表示第三个etxtmed 表。然后您遍历表的所有行和列并将其存储在动态数组中。对于arr(1 To r, 1 To 3),这是否意味着您将数组的列限制为 2?可以改成arr(0 To r, 0 To c)吗?
    • @SunGuochen 该页面有嵌套的表和类 etxtmed 所以我遍历它们并发现 tbl(2) 是你想要的。除了表格在第一行和最后一行合并了单元格之外,您可以拥有动态数组大小,因此确定最大大小并不容易。
    猜你喜欢
    • 2021-08-08
    • 1970-01-01
    • 2022-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-03-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多