【问题标题】:VBA - Number of Google News Search ResultsVBA - Google 新闻搜索结果的数量
【发布时间】:2019-04-05 23:08:52
【问题描述】:

我有一个单元格,其中包含我想在 Google 新闻中搜索的内容。我希望代码返回该搜索的结果数。目前我有这个代码,我在网站的其他地方找到了,不使用谷歌新闻,但即便如此,我有时也会得到一个

运行时错误-2147024891 (80070005)

经过70左右的搜索,我无法再次运行。

Sub HawkishSearch()

Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date

lastRow = Range("B" & Rows.Count).End(xlUp).Row

Dim cookie As String
Dim result_cookie As String

start_time = Time
Debug.Print "start_time:" & start_time

For i = 2 To lastRow

    url = "https://www.google.co.in/search?q=" & Cells(i, 2) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send

    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText

If html.getElementById("resultStats") Is Nothing Then
    str_text = "0 Results"
Else
    str_text = html.getElementById("resultStats").innerText
End If
    Cells(i, 3) = str_text
    DoEvents
Next

end_time = Time
Debug.Print "end_time:" & end_time

Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

【问题讨论】:

  • 请问什么是示例搜索词?以及如何访问印度谷歌新闻?
  • 我更改了网址。我发现的例子是印度,但我在美国。示例搜索可能只是公司名称
  • @QHarr 我已经让它工作了一段时间,但现在我没有得到任何结果。令人沮丧
  • 请提供您要抓取的完整网址
  • @QHarr url = "google.com/search?q=" & Cells(i, 2)

标签: html excel vba xmlhttprequest


【解决方案1】:

最佳选择 (IMO) 是使用 Google News API 并注册 API 密钥。然后,您可以使用包含您的搜索词的 queryString 并解析 JSON 响应以获取结果计数。我在下面执行此操作,并使用文章标题和链接填充集合。我使用一个名为 JSONConverter.bas 的 JSON 解析器,您可以将其下载并添加到您的项目中。然后您可以转到 VBE > 工具 > 参考 > 添加对 Microsoft Scripting Runtime 的引用。


来自 API 的 JSON 响应示例:

{} 表示您可以通过键访问的字典,[] 表示您可以通过索引或For Each 循环访问的集合。

我使用密钥 totalResults 从 API 返回的初始字典中检索结果总数。

然后我循环字典(文章)的集合并提取故事标题和 URL。

然后您可以在本地窗口中检查结果或打印出来

本地窗口中的结果示例:


Option Explicit

Public Sub GetStories()
    Dim articles As Collection, article As Object
    Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)
    Set finalResults = New Collection
    searchTerm = "Obama"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://newsapi.org/v2/everything?q=" & searchTerm & "&apiKey=yourAPIkey", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        Set json = JsonConverter.ParseJson(.responseText)
    End With

    Debug.Print "total results = " & json("totalResults")

    Set articles = json("articles")
    For Each article In articles
       arr(0) = article("title")
       arr(1) = article("url")
       finalResults.Add arr
    Next

    Stop '<== Delete me later

End Sub

循环:

如果在循环中部署,您可以使用类 clsHTTP 来保存 XMLHTTP 对象。这比创建和销毁更有效。我为这个类提供了一个方法 GetString 来从 API 检索 JSON 响应,以及一个 GetInfo 方法来解析 JSON 并检索结果计数以及 API 结果 URL 和标题。

本地窗口中的结果结构示例:

类 clsHTTP:

Option Explicit   
Private http As Object

Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal url As String) As String
    With http
        .Open "GET", url, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        GetString = .responseText
    End With
End Function

Public Function GetInfo(ByVal json As Object) As Variant
    Dim results(), counter As Long, finalResults(0 To 1), articles As Object, article As Object

    finalResults(0) = json("totalResults")
    Set articles = json("articles")

    ReDim results(1 To articles.Count, 1 To 2)

    For Each article In articles
        counter = counter + 1
        results(counter, 1) = article("title")
        results(counter, 2) = article("url")
    Next

    finalResults(1) = results
    GetInfo = finalResults
End Function

标准模块:

Option Explicit

Public Sub GetStories()
    Dim http As clsHTTP, json As Object
    Dim finalResults(), searchTerms(), searchTerm As Long, url As String
    Set http = New clsHTTP

    With ThisWorkbook.Worksheets("Sheet1")
        searchTerms = Application.Transpose(.Range("A1:A2")) '<== Change to appropriate range containing search terms
    End With

    ReDim finalResults(1 To UBound(searchTerms))

    For searchTerm = LBound(searchTerms, 1) To UBound(searchTerms, 1)

        url = "https://newsapi.org/v2/everything?q=" & searchTerms(searchTerm) & "&apiKey=yourAPIkey"

        Set json = JsonConverter.ParseJson(http.GetString(url))

        finalResults(searchTerm) = http.GetInfo(json)

        Set json = Nothing

    Next

    Stop '<==Delete me later
End Sub

'

否则:

我会使用以下内容,通过他们的班级名称来获取故事链接。我得到计数​​并将链接写入集合

Option Explicit

Public Sub GetStories()
    Dim sResponse As String, html As HTMLDocument, articles As Collection
    Const BASE_URL As String = "https://news.google.com/"
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument: Set articles = New Collection
    Dim numberOfStories As Long, nodeList As Object, i As Long
    With html
        .body.innerHTML = sResponse
        Set nodeList = .querySelectorAll(".VDXfz")
        numberOfStories = nodeList.Length
        Debug.Print "number of stories = " & numberOfStories
        For i = 0 To nodeList.Length - 1
            articles.Add Replace$(Replace$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)
        Next
    End With
    Debug.Print articles.Count
End Sub

标准 Google 搜索:

以下是标准 google 搜索的示例,但根据您的搜索词,您不会总是获得相同的 HTML 结构。您需要提供一些失败的案例来帮助我确定是否有可以应用的一致选择器方法。

Option Explicit
Public Sub GetResultsCount()
    Dim sResponse As String, html As HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.google.com/search?q=mitsubishi", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument
    With html
        .body.innerHTML = sResponse
        Debug.Print .querySelector("#resultStats").innerText
    End With

End Sub

【讨论】:

  • 我不确定你的意思是你不能使用完整的 URL。例如,我在 B 列中有一个包含 100 家左右公司的列表。我可以使用以下命令创建 google 搜索 URL:url = "'www.google.com/search?q=" & Cells(i, 2)。我遇到的所有问题就是从该 URL 中提取结果数
  • 您关注 Google 新闻的结果吗?
  • 现在只是普通的谷歌。不确定如何在 Google 新闻中执行此操作
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-10-21
  • 1970-01-01
  • 1970-01-01
  • 2020-03-26
相关资源
最近更新 更多