【问题标题】:Scraping data from website to Excel using a macro...lost使用宏将数据从网站抓取到 Excel...丢失
【发布时间】:2019-04-11 17:58:11
【问题描述】:

我对此完全陌生,但这是我的范围。 我正在运行一个宏来从业务系统中提取数据。 提取此信息后,我想要一个宏来获取某些字段,将它们放入网站表单中,单击提交,然后将某些数据结果抓取并粘贴回 Excel。 一切正常,减去抓取和粘贴回 excel。

请帮忙!

我搜索了整个堆栈溢出并观看了视频以试图弄清楚我需要做什么,但我一定是误解了一些东西。

Sub Track()
Range("B2").Select

'This should call to PT and deliver tracking info

Dim IE As Object
Dim tbl As Object, td As Object



 Set IE = CreateObject("InternetExplorer.Application") 'Set IEapp = 
 InternetExplorer
 IE.Visible = True

      IE.Navigate "https://www.partstown.com/track-my-order"
      With IEapp
          Do
          DoEvents
          Loop Until IE.readyState = 4



'Input PO and zip
 Call IE.Document.getElementById("orderNo").SetAttribute("value", 
 "4500969111")
'ActiveCell.Offset(0, 2).Select
 Call IE.Document.getElementById("postalCode").SetAttribute("value", 
 "37040")
 IE.Document.forms(7).Submit

 Application.Wait Now + TimeValue("00:00:09")

'this is where i am stuck. I know this isnt right but tried to piece it 
 together
 Set elemCollection = IE.Document.getelElementsByTagname("table.account- 
 table details _tc_table_highlighted")

 For t = 0 To (elemCollection.Length - 1)
 For r = 0 To (elemCollection(t).Rows.Length - 1)
    For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = 
 elemCollection(t).Rows.Cells(c).innertext
 Next c
 Next r
 Next t

 End With


 End Sub

这是我想要的: 航运专栏 订购数量 发货数量 并以线性方式显示: 运输、订购数量、发货数量、产品

【问题讨论】:

    标签: html excel vba web-scraping


    【解决方案1】:

    Internet Explorer:

    我已经把它比平时更详细一点,这样你就可以看到每个步骤。

    关键点:

    1) 正确的页面加载等待While .Busy Or .readyState < 4: DoEvents: Wend

    2) 尽可能按 id 选择元素。 # 是一个 CSS id selectorcss selectors 由.document 的querySelector 方法应用,并检索页面中与指定模式匹配的第一个元素

    3) 需要一个定时循环来等待结果出现

    4) 订单数量等信息是换行符分割的字符串。在这些换行符上拆分似乎最容易,然后通过索引从结果数组中访问单个项目

    5) 我按照您的规范订购了一个数组中的结果,然后将该数组一次性写入工作表

    6) “.”是.order-history__item-descript--min 中的class selector,即返回classorder-history__item-descript--min 的第一个元素

    7) [x=y] 是[data-label=Shipping] 中的attribute = value selector,即返回具有data-label 属性值Shipping 的第一个元素

    8) .details-table a 的组合使用descendant combinator, " ", 来指定我想要a 标签元素的父类为.details-table

    VBA:

    Option Explicit
    
    'VBE > Tools > References:
    ' Microsoft Internet Controls
    Public Sub RetrieveInfo()
        Dim ie As InternetExplorer, ele As Object, t As Date
        Const MAX_WAIT_SEC As Long = 5
    
        Set ie = New InternetExplorer
    
        With ie
            .Visible = True
            .Navigate2 "https://www.partstown.com/track-my-order"
    
            While .Busy Or .readyState < 4: DoEvents: Wend
    
            With .document
                .querySelector("#orderNo").Value = "4500969111"
                .querySelector("#postalCode").Value = "37040"
                .querySelector("#orderLookUpForm").submit  
            End With
    
            While .Busy Or .readyState < 4: DoEvents: Wend
    
            Dim shipping As String, order As String, items() As String
            With .document
                t = Timer
                Do
                    On Error Resume Next
                    Set ele = .querySelector("[data-label=Shipping]")
                    On Error GoTo 0
                    If Timer - t > MAX_WAIT_SEC Then Exit Do
                Loop While ele Is Nothing
    
                If ele Is Nothing Then Exit Sub
    
                shipping = ele.innerText
                order = .querySelector(".order-history__item-descript--min").innerText
                items = Split(order, vbNewLine)
    
                Dim qtyOrdered As Long, qtyShipped As String, product As String
    
                qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
                qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
                product = .querySelector(".details-table a").Title
    
                Dim results()
                results = Array(shipping, qtyOrdered, qtyShipped, product)
                ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results
    
            End With
            .Quit
        End With
    End Sub
    

    如果是 HTML 新手,请查看:

    https://developer.mozilla.org/en-US/docs/Web/HTML

    如果对 css 选择器不熟悉,请查看:

    https://flukeout.github.io/


    XMLHTTP:

    整个事情也可以用XHR 来完成。这比打开浏览器要快得多。

    XHR:

    使用 XMLHttpRequest (XHR) 对象与服务器交互。你可以 从 URL 中检索数据而无需执行整个页面 [render]

    在这种情况下,我向登录页面发出初始 GET 请求以检索 CSRFToken 用于我重新制定POST 当您手动输入数据并按提交时页面向服务器发出的请求。您在服务器响应中获得所需的数据。我在 POST 发送行的正文中传递了一个查询字符串 .send "orderNo=4500969111&amp;postalCode=37040&amp;CSRFToken=" &amp; csrft ;你可以在那里看到你的参数。

    Option Explicit
    Public Sub GetInfo()
        Dim html As HTMLDocument, csrft As String  '<  VBE > Tools > References > Microsoft HTML Object Library
        Set html = New HTMLDocument
    
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "https://www.partstown.com", False
            .send
    
            html.body.innerHTML = .responseText
    
            csrft = html.querySelector("[name=CSRFToken]").Value
    
            .Open "POST", "https://www.partstown.com/track-my-order", False
            .setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
            .setRequestHeader "User-Agent", "Mozilla/5.0"
            .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
            .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
            .setRequestHeader "Accept-Encoding", "gzip, deflate"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.9"
            .send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft
    
            html.body.innerHTML = .responseText
        End With
    
        Dim shipping As String, order As String, items() As String
    
        shipping = html.querySelector("[data-label=Shipping]").innerText
        order = html.querySelector(".order-history__item-descript--min").innerText
        items = Split(order, vbNewLine)
    
        Dim qtyOrdered As Long, qtyShipped As String, product As String
    
        qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
        qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
        product = html.querySelector(".details-table a").Title
    
        Dim results()
        results = Array(shipping, qtyOrdered, qtyShipped, product)
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results
    End Sub
    

    循环示例:

    Option Explicit
    
    Public Sub GetInfo()
        Dim html As HTMLDocument, csrft As String, lastRow As Long, sourceValues() '<  VBE > Tools > References > Microsoft HTML Object Library
        Set html = New HTMLDocument
        Dim ws As Worksheet, i As Long
        Set ws = ThisWorkbook.Worksheets("Sheet4")
        lastRow = ws.Cells(ws.rows.Count, "B").End(xlUp).Row
        sourceValues = ws.Range("B2:D" & lastRow).Value
        Dim results()
        ReDim results(1 To UBound(sourceValues, 1), 1 To 4)
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "https://www.partstown.com", False
            .send
            html.body.innerHTML = .responseText
    
            csrft = html.querySelector("[name=CSRFToken]").Value
            Stop
            For i = LBound(sourceValues, 1) To UBound(sourceValues, 1)
                If sourceValues(i, 1) <> vbNullString And sourceValues(i, 3) <> vbNullString Then
                    DoEvents
                    .Open "POST", "https://www.partstown.com/track-my-order", False
                    .setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
                    .setRequestHeader "User-Agent", "Mozilla/5.0"
                    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
                    .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
                    .setRequestHeader "Accept-Encoding", "gzip, deflate"
                    .setRequestHeader "Accept-Language", "en-US,en;q=0.9"
                    .send "orderNo=" & sourceValues(i, 1) & "&postalCode=" & sourceValues(i, 3) & "&CSRFToken=" & csrft
    
                    html.body.innerHTML = .responseText
    
                    Dim shipping As String, order As String, items() As String
    
                    shipping = html.querySelector("[data-label=Shipping]").innerText
                    order = html.querySelector(".order-history__item-descript--min").innerText
                    items = Split(order, vbNewLine)
    
                    Dim qtyOrdered As Long, qtyShipped As String, product As String
    
                    qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
                    qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
                    product = html.querySelector(".details-table a").Title
    
                    results(i, 1) = shipping
                    results(i, 2) = qtyOrdered
                    results(i, 3) = qtyShipped
                    results(i, 4) = product
                End If
                'Application.Wait Now + TimeSerial(0, 0, 1)
            Next
        End With
        'results written out from row 2 column E
        ws.Cells(2, 5).Resize(UBound(results, 1), UBound(results, 2)) = results
    End Sub
    

    【讨论】:

    • 我做到了,先生!很抱歉我没有早点回复。当我发表那篇文章时,我花了大约 25 个小时试图弄清楚。非常感谢。我感谢您的帮助。我唯一需要弄清楚的,我一直在研究的是如何更改代码(XMLHTTP)以从 excel 表中提取数据而不是单一数据(从单元格 B2 中提取 orderNo,从 D2 中提取 postalCode)、偏移量和循环.
    • 它完全按照我寻求帮助的方式完成了,我不小心过早地发布了我的第一条评论。我确实需要做更多的工作来让代码看起来像 excel 以找到值,直到 values = "" 我能在你很好地添加到你的帖子中的链接上找到这个信息吗?再次非常感谢您的帮助!
    • 我认为您可能指的是在循环中执行此操作,从工作表中获取 orderNo 和邮政编码的值?
    • 你想要哪个版本? xmlhttp 还是 Internet Explorer?
    • 是的,没错!我绝对会接受答案,你很有帮助!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-20
    • 1970-01-01
    • 2020-03-12
    • 1970-01-01
    相关资源
    最近更新 更多