【问题标题】:loop through page numbers when href contians doPostBack() in webpage当 href 在网页中包含 PostBack() 时,遍历页码
【发布时间】:2021-06-30 14:37:55
【问题描述】:

我需要通过单击下面网页中的页码来抓取每个页面上的日期。

我提到了与我的 html 网页相似的示例网站。

示例网页是这个Webpage

我的代码如下:

Sub Test()
Dim IE As Object
Dim i As Long, strText As String
Dim y As Long, z As Long, wb As Excel.Workbook, ws As Excel.Worksheet
Dim myBtn As Object
Dim Table As Object, tbody As Object, datarow As Object, thlist As Object, trlist As Object

Set wb = Excel.ActiveWorkbook
Set ws = wb.ActiveSheet
Sheets("Data").Select

Set IE = CreateObject("InternetExplorer.Application")
my_url = webpage.com
With IE
    .Visible = True
    .navigate my_url
    Do Until Not IE.Busy And IE.readyState = 4
        DoEvents
    Loop
End With
Set doc = IE.document
y = 1
z = 1
Application.Wait Now + TimeValue("00:00:02")
Set tbody = IE.document.getElementsByTagName("table")(0).getElementsByTagName("tbody")(0)
Set thlist = tbody.getElementsByTagName("tr")(0).getElementsByTagName("th")
Dim ii As Integer
For ii = 0 To thlist.Length - 1
    ws.Cells(z, y).Value = thlist(ii).innerText
    y = y + 1
Next ii
Set datarow = tbody.getElementsByTagName("tr")
y = 1
z = 2
Dim jj As Integer
Dim datarowtdlist As Object
For jj = 1 To datarow.Length - 4
    Set datarowtdlist = datarow(jj).getElementsByTagName("td")
    Dim hh As Integer, x As Integer
    x = y
    For hh = 0 To datarowtdlist.Length - 1
        ws.Cells(z, x).Value = datarowtdlist(hh).innerText
        x = x + 1
    Next hh
    z = z + 1
Next jj
Set IE = Nothing
End Sub

如果我的问题不清楚,我很乐意提供帮助。

感谢您的支持。

【问题讨论】:

  • @QHarr,我们无需输入任何内容。只需进入网站并提取表格..
  • 需要根据页码提取表格中的内容。 (页码位于屏幕右侧的底部)。无需输入任何搜索条件。
  • ie.document.parentwindow.execScript "javascript:__doPostBack('sb$grd','Page$10');" ?
  • QHarr,我应该在上面的代码中在哪里包含这一行以及如何单击页码直到最后一页。
  • 能否更新代码让我理解..

标签: html excel vba web-scraping


【解决方案1】:

通过增加__EVENTARGUMENT__EVENTARGUMENT 来检索下一页,例如从 1 到 2、2 到 3 等,然后使用新值触发 __doPostBack。当最终的td 节点(在分页区域中)不再具有包含__EVENTTARGET (sb$grd) 的子href 时,将到达最后一页。使用此逻辑,您可以循环、递增并设置退出条件,如下所示。

有关 ASP.NET 的此功能的更多信息,请参阅我的回答 here

Public Sub LoopPages()

    Dim ie As SHDocVw.InternetExplorer

    Set ie = New SHDocVw.InternetExplorer

    With ie

        .Visible = True
        .Navigate2 "https://www.mfa.gov.tr/sub.ar.mfa?dcabec54-44b3-4aaa-a725-70d0caa8a0ae"
        
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
        Dim i As Long

        i = 1

        Do
        
            Debug.Print i
            Debug.Print .document.querySelector(".sub_lstitm").innerText
        
            If .document.querySelectorAll("tr:nth-child(1) td:last-child [href*='sb$grd']").length = 0 Then Exit Do
        
            .document.parentWindow.execScript "__doPostBack('sb$grd','Page$" & i + 1 & "');"
        
            While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
            'do something with new page
        
            i = i + 1
        
        Loop
  
        Stop                                     'stops at 185
        .Quit
    End With

End Sub

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-02-22
    • 1970-01-01
    • 2020-03-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多