【问题标题】:Scrape website data, insert into an Excel cell, then move on to next抓取网站数据,插入 Excel 单元格,然后继续下一步
【发布时间】:2019-08-08 08:51:43
【问题描述】:

我的项目是在 tax and mot 网站中插入汽车注册,点击按钮,加载页面,然后获取日期。

我遇到的一个问题是在一个强 li 元素中提取数据,这是我在两个单元格中需要的税和 mot 的日期/日期。

Sub searchbot()

'dimension (declare or set aside memory for) our variables
    Dim objIE As InternetExplorer 'special object variable representing the IE browser

    Dim liEle As HTMLLinkElement 'special object variable for an <li> (link) element
    Dim pEle As HTMLLinkElement 'special object variable for an <a> (link) element

    Dim y As Integer 'integer variable we'll use as a counter

'''''''''''''''''''''''''''''''''''''''''''
'open internet

    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer

    'make IE browser visible (False would allow IE to run in the background)
    objIE.Visible = True

'''''''''''''''''''''''''''''''''''''''''''
'open tax/mot page

    'navigate IE to this web page (a pretty neat search engine really)
    objIE.Navigate "https://vehicleenquiry.service.gov.uk/"

    'make IE browser visible (False would allow IE to run in the background)
    objIE.Visible = True

    'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.ReadyState <> 4: DoEvents: Loop

'''''''''''''''''''''''''''''''''''''''''''
'enter details in to page

    'in the search box put cell "b2" value, the word "in" and cell "C" value
    objIE.Document.getElementById("Vrm").Value = _
    Sheets("INPUT & DATA RESULTS").Range("F3").Value

    'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.ReadyState <> 4: DoEvents: Loop

'''''''''''''''''''''''''''''''''''''''''''''''''''''

'click the 'Continue' button
objIE.Document.getElementsByClassName("button")(0).Click

'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.ReadyState <> 4: DoEvents: Loop

'''''''''''''''''''''''''''''''''''''''''''''''''''''

'click the 'Yes' button
objIE.Document.getElementById("Correct_True").Click

'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.ReadyState <> 4: DoEvents: Loop

'''''''''''''''''''''''''''''''''''''''''''''''''''''

'click the 'Continue' button
objIE.Document.getElementsByClassName("button")(0).Click

'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.ReadyState <> 4: DoEvents: Loop

'above works
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''

'HELP FROM HERE PLEASE

'take tax and mot dates and insert in to cells next to each other
'the first search result will go in row 2
y = 2

'TAKE TAX EXPIRY DATE AND PUT IN CELL
'I have tried reading up on extracting data from li elements, parent and child elements but struggling
For Each itemEle In objIE.Document.getElementsByClassName("top-section-list")
data = itemEle.getElementsByTagName("li")(0).innerText


'TAKE MOT EXPIRY DATE AND PUT IN CELL
'I have tried reading up on extracting data from li elements, parent and child elements but struggling
For Each itemEle In objIE.Document.getElementsByClassName("top-section-list")
data = itemEle.getElementsByTagName("li")(0).innerText


'increment our row counter, so the next result goes below
y = y + 1

'repeat times cells have car regs in
'Next
'take next car reg and do the same as above until there are no cells in rows with a car reg
Next
Range("A3").Value = data


'''''''''''''''''''''''''''''''''''''''''''
'close the browser
objIE.Quit

'''''''''''''''''''''''''''''''''''''''''''
'exit our SearchBot subroutine and start new row for new website data
End Sub

我是一名欺诈调查员,试图自学 VBA。

【问题讨论】:

    标签: excel vba web-scraping innertext


    【解决方案1】:

    您想要的项目在strong粗体)标签中,并且是页面上的前两个,因此您可以使用更快的 strong 的 css 选择器并执行

    Dim items As Object, i As Long, taxInfo As String, motInfo As String
    Set items = ie.document.querySelectorAll("strong")
    taxInfo = items.item(0).innerText
    motInfo = items.item(1).innerText
    

    只针对日期:

    taxInfo = Replace$(items.item(0).innerText,"Tax due: ",vbNullString)
    motInfo = Replace$(items.item(1).innerText,"Expires: ",vbNullString)
    

    这是使用 css 选择器的类似方法,现代网页针对这些选择器进行了优化,因此速度更快。 # 是一个 id 选择器。我使用了定时等待来确保搜索框存在以输入注册。如果找不到车辆,则会进行初步检查。

    Option Explicit   
    'VBE > Tools > References:
    ' Microsoft Internet Controls
    Public Sub CheckTax()
        Dim ie As InternetExplorer, searchBox As Object, t As Date, ws As Worksheet
        Const MAX_WAIT_SEC As Long = 20
        Dim inputValues(), i As Long
    
        Set ie = New InternetExplorer
        Set ws = ThisWorkbook.Worksheets("INPUT & DATA RESULTS")
        inputValues = Application.Transpose(ws.Range("F3:F5").Value) '<=change range here for range containing values to lookup
        With ie
            .Visible = True
    
            For i = LBound(inputValues) To UBound(inputValues)
                .Navigate2 "https://vehicleenquiry.service.gov.uk/"
    
                While .Busy Or .readyState < 4: DoEvents: Wend
                t = Timer
                Do
                    On Error Resume Next
                    Set searchBox = .document.querySelector("#Vrm")
                    On Error GoTo 0
                    If Timer - t > MAX_WAIT_SEC Then Exit Do
                Loop While searchBox Is Nothing
    
                If searchBox Is Nothing Then
                    Exit Sub
                Else
                    searchBox.Focus
                    searchBox.Value = inputValues(i)
                End If
    
                .document.querySelector(".button").Click
    
                While .Busy Or .readyState < 4: DoEvents: Wend
    
                If .document.querySelectorAll("h3").Length > 0 Then
                    ws.Cells(i + 2, "G") = "Vehicle details could not be found"
                    ws.Cells(i + 2, "H") = "Vehicle details could not be found"
                Else
                    t = Timer
                    Do
                        If Timer - t > MAX_WAIT_SEC Then Exit Do
                    Loop While ie.document.querySelectorAll("#Correct_True").Length = 0
    
                    ie.document.querySelector("#Correct_True").Click
                    While .Busy Or .readyState < 4: DoEvents: Wend
                    .document.querySelector(".button").Click
    
                    While .Busy Or .readyState < 4: DoEvents: Wend
    
                    Dim items As Object, taxInfo As String, motInfo As String
                    t = Timer
                    Do
                        On Error Resume Next
                        Set items = ie.document.querySelectorAll("strong")
                        On Error GoTo 0
                        If Timer - t > MAX_WAIT_SEC Then Exit Do
                    Loop While items.Length = 0
    
                    'taxInfo = items.item(0).innerText
                    'motInfo = items.item(1).innerText
    
                    'Debug.Print taxInfo, motInfo
    
                    taxInfo = Replace$(items.item(0).innerText, "Tax due: ", vbNullString)
                    motInfo = Replace$(items.item(1).innerText, "Expires: ", vbNullString)
    
                    ws.Cells(i + 2, "G") = taxInfo
                    ws.Cells(i + 2, "H") = motInfo
                End If
                Set searchBox = Nothing: Set items = Nothing
            Next
            .Quit
        End With
    End Sub
    

    【讨论】:

    • 可以轻松添加循环,但我不确定您是在工作表中循环 reg 编号还是从​​单个更改单元格值中进行选择并多次运行。
    • 非常感谢您的帮助,我稍后再试试。拥抱你
    • 您好,感谢您的帮助。我将循环浏览表格并检查每个汽车登记列表。放 Next 能解决问题吗?谢谢。
    • p.s 是否有一些关于使用 css 选择器的好材料可供阅读?
    • 谢谢 QHarr,我会试试的,你很有帮助。我的个人资料已经有一段时间了,但对正确使用这个网站很陌生,我该如何投票给你?它说我没有足够的积分。谢谢。
    【解决方案2】:

    这个网页的结构非常简单,只有一个元素是class = status-bar,而在这里面,你要查找的两个信息都在strong 类型的标签内。

    因此,无需循环,您可以简单地执行此操作(就在您编写“上述工作”的地方之后):

    'TAX EXPIRY DATE:
    TaxExpiryDate = objIE.Document.getElementsByClassName("status-bar")(0).getElementsByTagName("strong")(0).innerText
    'MOT EXPIRY DATE:
    MotExpiryDate = objIE.Document.getElementsByClassName("status-bar")(0).getElementsByTagName("strong")(1).innerText
    

    然后您可以将变量TaxExpiryDateMotExpiryDate 放在您希望的位置(例如Range("A1").Value = TaxExpiryDate)。

    变量本身包含&lt;strong&gt;标签的纯内容:

    Tax due:
    01 July 2019
    

    如果只想获取日期,可以Split() 使用vbNewLine 作为分隔符,只获取拆分的第二部分:

    'IN TWO LINES FOR BETTER CODE READIBILITY:
    TaxExpiryDate = objIE.Document.getElementsByClassName("status-bar")(0).getElementsByTagName("strong")(0).innerText
    TaxExpiryDate = Split(TaxExpiryDate, vbNewLine)(1)
    
    'IN ONE LINE FOR SHORTER CODE:
    TaxExpiryDate = Split(objIE.Document.getElementsByClassName("status-bar")(0).getElementsByTagName("strong")(0).innerText, vbNewLine)(1)
    

    【讨论】:

    • 嗨 matteo,这真的很有帮助,让我得到了我想要的结果。请你帮我移动到下一行的车辆规则,直到它空白?感谢您的时间。天然气
    • 嗨 matteo,请您帮我编写循环遍历行直到它们为空的代码,我对此位有一些真正的问题。再次感谢。
    猜你喜欢
    • 2020-08-23
    • 1970-01-01
    • 1970-01-01
    • 2015-01-05
    • 2020-06-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多