【问题标题】:Downloading files; Executing a JavaScript function using .execScript VBA下载文件;使用 .execScript VBA 执行 JavaScript 函数
【发布时间】:2018-01-17 11:11:56
【问题描述】:

情况:

我正在从网页NHS Delayed Transfers of Care 下载文件。

在 HTML 中我可以看到以下内容:

onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');"

在查看here 并看到这些 SO 问题(以及其他问题)之后:

我的印象是 ga() 是一个 JavaScript 函数,我应该可以直接用 .execScript 调用。

问题:

我可以使用.execScript 执行JavaScript 函数来下载文件吗?如果没有,我该如何下载文件?

我尝试过的:

我尝试了以下失败:

1) Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

'-2147352319 自动化错误


2)Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

错误 438 对象不支持此属性或方法


3)Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

错误 91 对象变量或未设置块变量


4)Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

-2147352319 由于错误 80020101,无法完成操作。

我承认对这类操作知之甚少。谁能看看我哪里出错了?

代码:

Option Explicit

Public Sub DownloadDTOC()

    Dim http As New XMLHTTP60
    Dim html As New HTMLDocument
    Dim CurrentWindow As HTMLWindowProxy

    With http
        .Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/delayed-transfers-of-care-data-2017-18/", False
        .send
        html.body.innerHTML = .responseText
    End With

    On Error GoTo Errhand

    'Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '-2147352319   Automation error

    'Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '438 Object doesn't support this property or method
'automation error

    'Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") ' 91 Object variable or With block variable not set

    Set CurrentWindow = html.parentWindow
    Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '--2147352319  Could not complete the operation due to error 80020101.

    Exit Sub

Errhand:
    If Err.Number <> 0 Then Debug.Print Err.Number, Err.Description
End Sub

已添加参考:

这是 HTML 的简化版本。抱歉,我不习惯格式化 HTML。

<p>
  <a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls" class="xls-link" onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');">Total Delayed Days Local Authority 2017-18 November (XLS, 121KB)</a>
  <br>
</p>

【问题讨论】:

  • 我知道 IE 可以很好地处理.execScript。您是否尝试过通过隐藏的 IE 窗口打开它然后执行您的脚本?
  • 您是否尝试过在xls-link 类中获取文本? onclick 在该类中也可用。但是,我想说的是 xmlhttp60 请求将无法从该页面获取任何内容,因为它甚至无法解析该类中的文本。该网站的内容是动态生成的。你应该去IE。
  • 我会尝试使用 IE。我故意避开,因为它很慢。
  • @Shahin 顺便说一句,当我尝试使用“xls-link”通过 className 获取时,没有返回任何内容。这与 .OuterHTML 与 .Inner 有关系吗?
  • 那个ga()只是调用google分析,不会影响下载,真的需要调用吗?

标签: javascript html vba excel web-scraping


【解决方案1】:

所以我最终使用 CSS 选择器来获取下载的所有 href,并将它们传递给 URLMon 进行下载。由于最新文件有两个月的延迟,因此我过滤了文件以在下个月的 2 个月下载。


CSS 选择器:

我选择的选择器是#main-content a[href*=xls]

这会寻找带有a标签元素的元素,属性href包含字符串"xls",在id为main=content的元素内部。


示例 CSS 查询结果:


VBA:

Option Explicit
Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" ( _
ByVal pCaller As LongPtr, _
ByVal szURL As String, _
ByVal szFileName As String, _
ByVal dwReserved As LongPtr, _
ByVal lpfnCB As LongPtr _
) As Long
Private Declare PtrSafe Function DeleteUrlCacheEntry Lib "Wininet.dll" _
Alias "DeleteUrlCacheEntryA" ( _
ByVal lpszUrlName As String _
) As Long

Public Const BINDF_GETNEWESTVERSION As Long = &H10

Public Sub DownloadFiles()
    Dim http As New XMLHTTP60, html As New HTMLDocument, downloads As Collection
    With http
        .Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/", False
        .send
        html.body.innerHTML = .responseText
    End With

    Dim aNodeList As Object, i As Long
    Set downloads = New Collection
    Set aNodeList = html.querySelectorAll("#main-content a[href*=xls]")
    For i = 0 To aNodeList.Length - 1
        downloads.Add aNodeList.item(i).getAttribute("href")
    Next i

    For i = 1 To downloads.Count
        If InStr(downloads(i), Format(DateAdd("m", -2, Date), "mmmm-yyyy")) > 0 Then
            Debug.Print downloads(i)
            downloadFile downloads(i)
        End If
    Next i
End Sub

Public Sub downloadFile(ByVal url As String)
    Dim ret As Long, arr() As String, outputPath As String
    arr = Split(url, Chr$(47))
    outputPath = "C:\Users\HarrisQ\Desktop\" & arr(UBound(arr))
    ret = URLDownloadToFile(0, url, outputPath, BINDF_GETNEWESTVERSION, 0)
End Sub

参考资料:

需要对 HTML 对象库和 Microsoft XML 的引用。


API 调用:

为 64 位编写

【讨论】:

    猜你喜欢
    • 2019-09-29
    • 1970-01-01
    • 2011-03-15
    • 2017-05-10
    • 2014-01-20
    • 2019-03-09
    • 2010-09-25
    • 2013-10-06
    • 1970-01-01
    相关资源
    最近更新 更多