【问题标题】:Macro that follows a link and downloads the table into a new sheet跟随链接并将表格下载到新工作表中的宏
【发布时间】:2014-12-29 14:09:54
【问题描述】:

我是一名地质学家,在路易斯安那州的一家小型石油公司工作。我组成了我们的技术部门,不幸的是我的编码经验非常有限。过去我使用过非常基本的 vba 编码,但我在日常工作中没有编写那么多代码,所以我已经忘记了大部分。

路易斯安那州 dnr 为该州钻探的每一口油井保留了惊人的记录,所有这些记录都位于 www.Sonris.com。这些记录的一部分是每口井的生产记录。我想创建一个遵循给定 url 的宏并下载在 URL 上找到的表(也就是生产记录)。下载文件后,我希望它把表格放在一个新的工作表中,然后根据井名命名这个工作表。

我已经从网络函数中获取数据,但是我不能使函数足够动态。我需要代码来复制在单元格中找到的超链接数据。目前,代码只是跟随我在录制宏时复制和粘贴的超链接。

任何帮助将不胜感激

真诚地, 约西亚

下面是生成的代码;

    Sub Macro2()
'
'     Macro2 Macro
' attempt with multiple well to look at code instead of 1 well
'

'
    Range("E27").Select
    ActiveWorkbook.Worksheets.Add
    With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellinfo2?p_WSN=159392" _
        , Destination:=Range("$A$1"))
        .Name = "cart_con_wellinfo2?p_WSN=159392"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
        .WebTables = "1,11"
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    Sheets("Sheet1").Select
End Sub

【问题讨论】:

  • 您要检索多少口井,WSN 编号在哪里,您想要所有表格还是仅选择一个?
  • 理想情况下,我希望对整个油田进行此操作(因此在 300 口井的范围内)。我只想在一组表中使用 1 个表。什么是 WSN 号码?
  • 我猜 WSN 是 井序列号,如 ...?p_WSN=159392
  • 啊,是的,我很抱歉。我以为那是一个代码。那确实代表井序列号。序列号位于工作表的列中。到目前为止,我还没有尝试在代码中使用它。
  • 你上面的代码当前在运行时做了什么?它是否适用于特定的 WSN=159392

标签: vba excel web-scraping


【解决方案1】:

使用所有可用于清理外部数据的方法,许多用户都忘记了您可以打开一个充满表格的网页,只需要一个有效的 URL 和文件 ► 打开。我在这里发布代码,但我还将提供一个工作示例工作簿的链接,该工作簿需要大约 2 分钟来从 14 个按顺序编号的 WSN(web 序列号)页面收集完整的网页数据。您自己的结果可能会有所不同。

Option Explicit

Public Const csURL As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellinfo2?p_WSN=×WSN×"

Sub Gather_Well_Data()
    Dim rw As Long, lr As Long, w As Long, wsn As String, wb As Workbook
    On Error GoTo Fìn
    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    With ThisWorkbook.Sheets("WSNs")
        lr = .Cells(Rows.Count, 1).End(xlUp).Row
        For rw = 2 To lr
            .Cells(rw, 2) = 0
            For w = 1 To .Parent.Sheets.Count
                If .Parent.Sheets(w).Name = CStr(.Cells(rw, 1).Value) Then
                    .Parent.Sheets(w).Delete
                    Exit For
                End If
            Next w
            wsn = Replace(csURL, "×WSN×", .Cells(rw, 1).Value)
            Set wb = Workbooks.Open(Filename:=wsn, ReadOnly:=True, addtomru:=False)
            wb.Sheets(1).Range("A1:A3").Font.Size = 12
            wb.Sheets(1).Copy After:=.Parent.Sheets(.Parent.Sheets.Count)
            .Parent.Sheets(.Parent.Sheets.Count).Name = .Cells(rw, 1).Value
            wb.Close savechanges:=False
            Set wb = Nothing
            .Cells(rw, 2) = 1
            Application.ScreenUpdating = True
            Application.ScreenUpdating = False
            .Parent.Save
        Next rw
        .Activate
    End With
Fìn:
    Application.DisplayAlerts = True
    Application.ScreenUpdating = True
End Sub

WSN 标识符列表位于 WSNs 工作表中,从第 2 列开始。点击 Alt+F8 运行宏以打开Macros 对话框和Run Gather_Well_Data 宏。完成后,您将拥有一个工作簿,其中包含由 WSN 标识的工作表,如下所示。

         

示例工作簿位于我的公共 DropBox 上:

LA_WSN_Data.xlsb

【讨论】:

  • 非常感谢您为此所做的所有工作。我有一个后续问题。有没有办法只从网页上下载一个表格而不是整个井报告?我希望自动下载生产数据(网页上的租赁单元/油井生产表),然后将其插入格式化的工作簿以自动计算我公司感兴趣的一些值。
  • @JosiahHulsey - 是的,使用 Excel 的数据 ► 获取外部数据 ► 从网页、Internet Explorer 对象甚至是 Msxml.DOMDocument dom 对象都非常可能,但 TBH 会更容易收集您的 300 页并删除您不想要的所有内容,或将您真正想要的内容整理到一个大型数据表或数据库中。
【解决方案2】:

只是为了搭载@Jeeped 很棒的解决方案,我在格式中添加了要删除的内容,只剩下 LeaseUnit/Well/Production 信息。这假设套管表始终遵循生产表

Option Explicit

Public Const csURL As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellinfo2?p_WSN=×WSN×"

Sub Gather_Well_Data()
    Dim rw As Long, lr As Long, w As Long, wsn As String, wb As Workbook, frow As String, lrow As String
    On Error GoTo Fìn
    Application.ScreenUpdating = False
    Application.DisplayAlerts = False



    With ThisWorkbook.Sheets("WSNs")
        lr = .Cells(Rows.Count, 1).End(xlUp).Row
        For rw = 2 To lr
            .Cells(rw, 2) = 0
            For w = 1 To .Parent.Sheets.Count
                If .Parent.Sheets(w).Name = CStr(.Cells(rw, 1).Value) Then
                    .Parent.Sheets(w).Delete
                    Exit For
                End If
            Next w
            wsn = Replace(csURL, "×WSN×", .Cells(rw, 1).Value)
            Set wb = Workbooks.Open(Filename:=wsn, ReadOnly:=True, addtomru:=False)

            frow = Application.WorksheetFunction.Match("LEASE\UNIT\WELL PRODUCTION", Range("A:A"), 0)
            lrow = Application.WorksheetFunction.Match("Casing", Range("A:A"), 0)
            lrow = lrow - 1
            frow = "A" & frow
            lrow = "K" & lrow
            Range(frow, lrow).Cut Range("Q1")
            Columns("A:P").Select
            Selection.Delete Shift:=xlToLeft
            Cells.EntireColumn.AutoFit

            wb.Sheets(1).Range("A1:A3").Font.Size = 12
            wb.Sheets(1).Copy After:=.Parent.Sheets(.Parent.Sheets.Count)
            .Parent.Sheets(.Parent.Sheets.Count).Name = .Cells(rw, 1).Value
            wb.Close savechanges:=False
            Set wb = Nothing
            .Cells(rw, 2) = 1
            Application.ScreenUpdating = True
            Application.ScreenUpdating = False
            .Parent.Save
        Next rw
        .Activate
    End With
Fìn:
    Application.DisplayAlerts = True
    Application.ScreenUpdating = True
End Sub

【讨论】:

    【解决方案3】:

    Jeped 的方法令人震惊。+1

    您也可以针对 API 发出POST 请求,并将所有表格写出如下。

    注意:我正在将每个井信息写在另一个之下,但放置一个表格很容易。在下一次 API 调用之前添加一行,并确保每个写出的数据都使用活动表格。

    Option Explicit
    Public Sub GetWellInfo()
        Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
        Const PARAM1 As String = "p_apinum"
        Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
        apiNumbers = Array(1708300502, 1708300503)
    
        Application.ScreenUpdating = False
        Set ws = ThisWorkbook.Worksheets("Sheet1")
    
        With ws
            .Cells.ClearContents
            For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
                Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
                Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
                Dim allTables As Object
                Set allTables = page.getElementsByTagName("table")
    
                For Each targetTable In allTables
                    AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                    WriteTables targetTable, GetLastRow(ws, 1), ws
                Next targetTable
    
            Next currNumber
        End With
        Application.ScreenUpdating = True
    End Sub
    
    Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
        Dim objHTTP As Object, html As New HTMLDocument
    
        Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
    
        Dim sBody As String
        If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
        With objHTTP
            .SetTimeouts 10000, 10000, 10000, 10000
            .Open "POST", url, False
            .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
            .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
            On Error Resume Next
            .send (sBody)
            If Err.Number = 0 Then
                If .Status = "200" Then
                    html.body.innerHTML = .responseText
                    Set GetPage = html
                Else
                    Debug.Print "HTTP " & .Status & " " & .statusText
                    Exit Function
                End If
            Else
                Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
                Exit Function
            End If
            On Error GoTo 0
        End With
    
    End Function
    
    Public Function GetNextURL(ByVal inputString As String)
        GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
    End Function
    
    Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            ws.Cells(startRow, columnCounter) = header.innerText
        Next header
    End Sub
    
    Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
        If ws Is Nothing Then Set ws = ActiveSheet
    
        Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
        r = startRow
        With ActiveSheet
            Set tRow = hTable.getElementsByTagName("tr")
            For Each tr In tRow
                Set tCell = tr.getElementsByTagName("td")
                For Each td In tCell
                    .Cells(r, c).Value = td.innerText
                    c = c + 1
                Next td
                r = r + 1:  c = 1
            Next tr
        End With
    End Sub
    

    【讨论】:

      猜你喜欢
      • 2018-03-12
      • 2017-03-15
      • 2021-11-07
      • 1970-01-01
      • 2013-03-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多