Rvest 在网站上没有看到 xpath答案

【问题标题】：Rvest not seeing xpath in websiteRvest 在网站上没有看到 xpath
【发布时间】：2017-06-15 15:18:40
【问题描述】：

我正在尝试使用 R 中的 rvest 包scrape this website。我已经在其他几个网站上成功完成了它，但这个似乎不起作用，我不知道为什么。

我从 chrome 的检查器工具中复制了 xpath，但是当我在 rvest 脚本中指定它时，它显示它不存在。这与表格是生成的而不是静态的有关吗？

感谢您的帮助！

library(rvest)
library (tidyverse)
library(stringr)
library(readr)

a<-read_html("http://www.diversitydatakids.org/data/profile/217/benton-county#ind=10,12,15,17,13,20,19,21,24,2,22,4,34,35,116,117,123,99,100,127,128,129,199,201")
a<-html_node(a, xpath="//*[@id='indicator10']")
a<-html_table(a)
a

【问题讨论】：

你能告诉我们你尝试了什么吗？您很可能需要将httr::POST 与xml2 包一起使用。这是一个示例：https://stackoverflow.com/questions/44313122/scraping-dynamic-table-in-r-with-post
编辑帖子以包含代码

标签： r web-scraping rvest

【解决方案1】：

关于您的问题，是的，您无法得到它，因为它是动态生成的。在这些情况下，最好使用RSelenium 库：

#Loading libraries
library(rvest) # to read the html
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of the website

# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()

#Specifying the url for desired website to be scrapped
url <- "http://www.diversitydatakids.org/data/profile/217/benton-county#ind=10,12,15,17,13,20,19,21,24,2,22,4,34,35,116,117,123,99,100,127,128,129,199,201"

# go to website
remDr$navigate(url)

# get page source and save it as an html object with rvest
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()

# get the element you are looking for
a <-html_node(html_obj, xpath="//*[@id='indicator10']")

我猜你正试图获得第一张桌子。在这种情况下，也许最好直接用read_table 来获取表：

# get the table with the indicator10 id
indicator10_table <-html_node(html_obj, "#indicator10 table") %>% html_table()

这次我使用的是 CSS 选择器而不是 XPath。

希望对您有所帮助！祝你刮得愉快！

【讨论】：