【问题标题】:How to fetch headlines from google news using rvest R?如何使用 rvest R 从谷歌新闻中获取头条新闻?
【发布时间】:2016-09-24 05:34:18
【问题描述】:

我想在 R 中使用 rvest 从谷歌新闻中获取头条新闻。到目前为止,我已经这样做了

library(rvest)
url=read_html("https://www.google.com/search?hl=en&tbm=nws&authuser=0&q=american+president")
selector_name<-"r"
fnames<-html_nodes(x = url, css = selector_name) %>%
  html_text()

但结果是

> fnames
character(0)

这是标题的检查元素?

<h3 class="r"><a href="/browse.php/PbtvpluS/QDvUJpC7/KoWCA9QE/VTTOFmVJ/bIp8sMa8/qKjgkcAu/Hgcr9lyg/4bibGCOO/nZ82ojLo/_2B602Vo/0sOSEbba/SaiySebj/AqD60GRO/skpNXIW9/fA8EbzOq/z6XjMXo2/9iDad2zD/qREp_2Fp/hoHl64rG/9wfBHOPB/a0nLFrAz/OsCmtfKV/cQoDAFWY/cRXfd5FX/5OAJF8UR/9gUdG_2F/_2F4hOLN/xOfe6_2F/shH2n9O7/hCZGQosp/eeAh6wAC/JhCOgG0i/sLkpRGRN/PH_2B61L/njabdbV1/vpS4wcbX/NKpO_2Bq/jpun2LeG/TQecIESv/vxFbk19Q/_3D_3D/b29/">Obama Addresses Racial Tensions at Celebration of African ...</a></h3>

如何从谷歌新闻中获取头条新闻?

【问题讨论】:

  • 获取 RSS 提要可能是最简单的方法。有一个feedeR 包,看起来它只需要一个 URL 来提取数据。

标签: r rvest


【解决方案1】:

我认为您只是缺少一个类名的点:

> headlines = read_html("https://www.google.com/search?hl=en&tbm=nws&authuser=0&q=american+president") %>%
  html_nodes(".r") %>% 
  html_text()
> headlines
 [1] "Iranian President: No American President Can Renegotiate the Now ..."
 [2] "US: President Barack Obama vetoes 9/11 bill"                         
 [3] "President Obama Wants Donald Trump to Visit New African ..."         
 [4] "President Obama: Discrimination Should Concern 'All Americans ..."   
 [5] "Conrad Black: The Middle East watches, and waits, for the next ..."  
 [6] "Putin's close friend: Donald Trump will be next US president"        
 [7] "US election 2016 polls and odds: Latest Donald Trump and Hillary ..."
 [8] "US election: Ted Cruz endorses Donald Trump for president"           
 [9] "Obama – I'm proud of my 'African record' as US president"            
[10] "Almost 6000 Americans Have Already Voted for President"   

【讨论】:

    【解决方案2】:

    你可以这样做:

    library(rvest)
    reviews <- link %>%
        read_html() %>%
        html_nodes(".g") %>%
        html_text()
    

    您通过检查元素检查文本(存在标题)的位置,在这种情况下,它将类 g。然后读取每个节点内的文本。

    【讨论】:

    • 您好,您如何使用 Rvest 抓取所有搜索结果页面以获取更多标题?
    猜你喜欢
    • 1970-01-01
    • 2015-01-27
    • 1970-01-01
    • 1970-01-01
    • 2013-07-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多