使用 rvest 抓取 Google 搜索结果时出现 Character(0) 错误

【问题标题】：Character(0) error when using rvest to webscrape Google search results使用 rvest 抓取 Google 搜索结果时出现 Character(0) 错误
【发布时间】：2020-10-17 21:45:32
【问题描述】：

我正在尝试抓取 Google 搜索的标题。但是，不管我用rvest 尝试什么，结果总是返回character(0)。

这里是搜索rstudio的代码：

library(rvest)
library(dplyr)

web1 <- read_html("https://www.google.at/search?q=rstudio") 
header <-web1 %>%
    html_nodes(".DKV0Md") %>%
    html_text()
header

我在SelectorGadget 中签入的节点名称，所以这应该不是问题。我应该如何解决这个问题？

【问题讨论】：

标签： r web-scraping rvest google-search

【解决方案1】：

也许，我们可以使用：

library(rvest)
library(dplyr)
web1 %>% 
   html_nodes(xpath = '//div/div/div/a/div[not(div)]') %>% 
   html_text

输出：

#[1] "rstudio.com"        
#[2] "rstudio.cloud"           
#[3] "en.wikipedia.org › wiki › RStudio"    
# ....

【讨论】：

非常感谢您的回答，阿克伦！但是您是否知道为什么带有类名（而不是 xpath）的初始代码不起作用？
@AlinaZamaletdinova class 在谷歌搜索中不稳定
以防万一有人遇到同样的问题，下面是对我有用并返回标题的代码：library(rvest) library(dplyr) web1 <- read_html("https://www.google.at/search?q=munich+prices") web1 %>% html_nodes(xpath = '//div/div/div/a/h3/div[not(div)]') %>% html_text