【问题标题】:Search a website using excel vba with excel data and extract the active state in flowchart of search result and mapping it into column使用带有excel数据的excel vba搜索网站,并在搜索结果的流程图中提取活动状态并将其映射到列中
【发布时间】:2023-03-11 05:19:01
【问题描述】:

我希望有人可以提供帮助....

我在一个 Excel 电子表格中有大约 7000 个值,我需要在网站中搜索这些值,然后从网站记录结果流程图的活动状态以输入回 Excel 电子表格。由于我是宏网络抓取的新手,我曾经为我想要提取信息的网站(https://nacionalidade.justica.gov.pt/)自动修改网络代码修改的输入 ID。我对如何应用 if 条件来获得在流程图中有七个类的活动状态有点困惑,这是流程图。

现在我有访问代码每个将在不同的阶段,我只想选择状态并将其放在访问代码前面的 E 列中(目前是手动执行)

我不清楚如何提取此类网络数据提取的新信息 - 任何帮助都会令人难以置信!

这是我的代码:(在此之后无法更改提到的网络)

objIE.document.getElementById("btnPesquisa").Click

代码:

'start a new subroutine called SearchBot
Sub SearchBot()
 
    'dimension (declare or set aside memory for) our variables
    Dim objIE As InternetExplorer 'special object variable representing the IE browser
    Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
    Dim y As Integer 'integer variable we'll use as a counter
    Dim result As String 'string variable that will hold our result link
 
    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer
 
    'make IE browser visible (False would allow IE to run in the background)
    objIE.Visible = True
 
    'navigate IE to this web page (a pretty neat search engine really)
    objIE.navigate "https://nacionalidade.justica.gov.pt/"
 
    'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    'in the search box put cell "A2" value, the word "in" and cell "C1" value
    objIE.document.getElementById("SenhaAcesso").Value = _
      Sheets("Guy Touti").Range("D2").Value
 
    'click the 'go' button
    objIE.document.getElementById("btnPesquisa").Click
 
    'wait again for the browser
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    'the first search result will go in row 2
    y = 2
 
    'for each <a> element in the collection of objects with class of 'result__a'...
    For Each aEle In objIE.document.getElementsByClassName("result__a")
 
        '...get the href link and print it to the sheet in col C, row y
        result = aEle
        Sheets("Guy Touti").Range("E" & y).Value = result
 
        '...get the text within the element and print it to the sheet in col D
        Sheets("Guy Touti").Range("D" & y).Value = aEle.innerText
        Debug.Print aEle.innerText
 
        'is it a yellowpages link?
        If InStr(result, "yellowpages.com") > 0 Or InStr(result, "yp.com") > 0 Then
            'make the result red
            Sheets("Guy Touti").Range("C" & y).Interior.ColorIndex = 3
            'place a 1 to the left
            Sheets("Guy Touti").Range("B" & y).Value = 1
        End If
 
        'increment our row counter, so the next result goes below
        y = y + 1
 
    'repeat times the # of ele's we have in the collection
    Next
 
    'add up the yellowpages listings
    Sheets("Guy Touti").Range("B1").Value = _
      Application.WorksheetFunction.Sum(Sheets("Guy Touti").Range("B2:B100"))
 
    'close the browser
    objIE.Quit
 
'exit our SearchBot subroutine
End Sub

我确实先尝试过,但过了一段时间开始寻找更好的方法。你能帮忙吗????

【问题讨论】:

    标签: html excel vba web-scraping


    【解决方案1】:

    您可以简化页面发出的 POST XHR 请求以获取数据并使用类名来限制具有active1active3 的节点。获取该节点列表中的最后一个节点并提取步骤编号并通过查找(如果需要)转换颜色。对于 7,000 个请求,在每 50 个或更少的请求中添加 1-2 秒的延迟可能会很周到。您可以i mod 50 在循环中确定这一点并使用Application.Wait Now + Timeserial(0,0,2)

    Option Explicit
    
    Public Sub GetStatus()
    
        Dim html As MSHTML.HTMLDocument, xhr As Object, colourLkup As Object
        Dim ws As Worksheet, senhas(), i As Long, results()
    
        Set ws = ThisWorkbook.Worksheets("Sheet1")
        senhas = Application.Transpose(ws.Range("D2:D" & ws.Cells(ws.Rows.Count, "D").End(xlUp).Row))
    
        ReDim results(1 To UBound(senhas))
    
        Set colourLkup = CreateObject("Scripting.Dictionary")
        colourLkup.Add "active1", "green"
        colourLkup.Add "active3", "orange"
    
        Set html = New MSHTML.HTMLDocument
        Set xhr = CreateObject("MSXML2.XMLHTTP")
    
        For i = LBound(senhas) To UBound(senhas)
            If senhas(i) <> vbNullString Then
                With xhr
                    .Open "POST", "https://nacionalidade.justica.gov.pt/Home/GetEstadoProcessoAjax", False
                    .setRequestHeader "User-Agent", "Mozilla/5.0"
                    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
                    .send "SenhaAcesso=" & senhas(i)
                    html.body.innerHTML = .responseText
                End With
    
                Dim nodes As Object, classinfo() As String
    
                Set nodes = html.querySelectorAll(".active1, .active3")
    
                classinfo = Split(nodes(nodes.Length - 1).className, Chr$(32))
                results(i) = Replace$(classinfo(1), "step", vbNullString) & "-" & colourLkup(classinfo(2))
            End If
            Set nodes = Nothing
        Next
        ws.Cells(2, 5).Resize(UBound(results), 1) = Application.Transpose(results)
    End Sub
    

    【讨论】:

    • 亲爱的得到错误对象变量或未在“classinfo = Split(nodes(nodes.Length - 1).className, Chr$(32))”行中设置块变量
    • 此时 senhas(i) 的值是多少 - 发生错误时?您的价值观是否如图所示?
    • 那么i的价值是什么?
    • I=122 和 senhas(i)=empty 但我的列 E 仍未从第 2 行填充,但我想它应该填充列 E 直到 122
    • 你有D123的值吗?用于查找的 D 列中不应有任何空值。
    猜你喜欢
    • 2012-11-20
    • 2020-03-19
    • 2019-06-13
    • 1970-01-01
    • 2013-10-21
    • 1970-01-01
    • 2012-07-31
    • 2017-01-27
    • 2019-12-20
    相关资源
    最近更新 更多