【问题标题】:How to suppress cookie request如何抑制 cookie 请求
【发布时间】:2015-02-26 16:51:36
【问题描述】:

我在 Excel 2013 中使用 vba 从 Yahoo 期权合约中刮取数据,虽然我确实获得了数据,但我也收到了多个接受 cookie 的请求(请参阅下面的对话框)。

我尝试接受这一点,看看它是否会阻止进一步的弹出窗口,但没有这样的运气。如何抑制对话框?

顺便说一句,我很确定 yahoo_option_contract 有一个 api 可以提供一些无 cookie 的 xml,但我无法让它工作。任何人都可以验证它确实有效并提供解释如何使用它的链接吗?

干杯

更多信息

这是sample link to yahoo's site。碰巧我在previous SO post的底部展示了我的大部分代码和抓取策略

更新

Set http = New MSXML2.XMLHTTP60
With http
    .Open "GET", aUrl, False
    .send
    Do Until .readyState = 4
        DoEvents
    Loop
End With

Select Case http.Status
    Case Is = 200
        Set GetHttp = http
    Case Else
        err.Raise Number:=ERR_WEB_CONNECTION, _
            Description:="Bad Response " & http.Status & mStrings.Bracket(http.statusText)
End Select

【问题讨论】:

  • 您能描述一下您抓取数据的方法吗?并提供一些链接?这将有助于回答。
  • @omegastripes。当然,请参阅我在帖子底部添加的“更多信息”编辑。干杯

标签: excel web-scraping yahoo-api yahoo-finance vba


【解决方案1】:

尝试下面的 VBA 代码通过 XHR 检索页面的 HTML 内容,用 RegEx 解析并输出到工作表:

Option Explicit

Sub Scrape_Yahoo_Option_Contract()

    Dim sUrl As String
    Dim aHeaders
    Dim sResp As String
    Dim sContent
    Dim oTables As Object
    Dim oRows As Object
    Dim aData()
    Dim i As Long

    ' Get data
    sUrl = "https://finance.yahoo.com/quote/AAPL"
    aHeaders = Array( _
        Array("user-agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36") _
    )
    XmlHttpRequest "GET", sUrl, aHeaders, "", "", sResp
    ' Parse tables
    ParseToDict "(<table class=""[^""]*?W\(100%\)[^>]*>)([\s\S]*?)</table>", sResp, oTables
    ' Parse rows
    For Each sContent In oTables.Items
        ParseToDict "<tr><td>(.*?)</td><td>(.*?)</td></tr>", HtmlSimplify(sContent), oRows
    Next
    ' Populate 2d array
    ReDim aData(1 To oRows.Count, 1 To 2)
    i = 1
    For Each sContent In oRows
        aData(i, 1) = GetInnerText(sContent)
        aData(i, 2) = GetInnerText(oRows(sContent))
        i = i + 1
    Next
    ' Output array to worksheet 1
    With ThisWorkbook.Sheets(1)
        .Cells.Delete
        Output2DArray .Cells(1, 1), aData
        .Cells.EntireColumn.AutoFit
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub XmlHttpRequest(sMethod As String, sUrl As String, arrSetHeaders, sFormData, sRespHeaders As String, sContent As String)

    Dim arrHeader

    'With CreateObject("Msxml2.ServerXMLHTTP.3.0")
    '    .SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
    With CreateObject("Msxml2.XMLHTTP")
        .Open sMethod, sUrl, False
        If IsArray(arrSetHeaders) Then
            For Each arrHeader In arrSetHeaders
                .SetRequestHeader arrHeader(0), arrHeader(1)
            Next
        End If
        .Send sFormData
        sRespHeaders = .GetAllResponseHeaders
        sContent = .ResponseText
    End With

End Sub

Function HtmlSimplify(ByVal sCont)

    With CreateObject("VBScript.RegExp")
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = "(<[\w\/^<]*)[\s\S]*?>"
        sCont = .Replace(sCont, "$1>")
        .Pattern = "(?:<span>|</span>)"
        sCont = .Replace(sCont, "")
        .Pattern = "(?:<small>|</small>)"
        sCont = .Replace(sCont, "")
        .Pattern = "&nbsp;"
        sCont = .Replace(sCont, " ")
        .Pattern = "[\f\n\r\t\v]"
        sCont = .Replace(sCont, "")
        .Pattern = " +"
        sCont = .Replace(sCont, " ")
        .Pattern = "> <"
        sCont = .Replace(sCont, "><")
    End With
    HtmlSimplify = sCont

End Function

Sub ParseToDict(sPattern As String, sResponse As String, oDict As Object)

    Dim oMatch

    If oDict Is Nothing Then Set oDict = CreateObject("Scripting.Dictionary")
    With CreateObject("VBScript.RegExp")
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = sPattern
        For Each oMatch In .Execute(sResponse)
            If Trim(oMatch.SubMatches(0)) <> "" Then oDict(oMatch.SubMatches(0)) = oMatch.SubMatches(1)
        Next
    End With

End Sub

Function GetInnerText(ByVal sHtml As String) As String

    Static oHtmlfile As Object

    If oHtmlfile Is Nothing Then ' init
        Set oHtmlfile = CreateObject("htmlfile")
        oHtmlfile.Open
        oHtmlfile.Write "<body></body>"
    End If
    ' Convert
    On Error Resume Next
    oHtmlfile.body.innerHTML = sHtml
    GetInnerText = oHtmlfile.body.innerText

End Function

【讨论】:

  • 你知道有没有等价的方法。SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS for XMLHTTP60.我有一些通用代码用于执行 Web 请求(请参阅编辑后的帖子)。
  • 哦,好吧,看来您可以将 XMLHTTP60 更改为 ServerXMLHTTP60,它允许 SetOption 并且仍然有效 - 但它似乎并没有停止错误
  • 在我的 PC 上没有出现允许对话框的 cookie,而且我还没有想出重现该行为的步骤,Msxml2.ServerXMLHTTPMsxml2.XMLHTTP 都没有,@987654324 也没有@环境,也不是VBS
猜你喜欢
  • 1970-01-01
  • 2010-12-18
  • 1970-01-01
  • 1970-01-01
  • 2012-03-13
  • 2014-08-23
  • 2019-06-20
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多