【问题标题】:Rvest not seeing xpath in websiteRvest 在网站上没有看到 xpath
【发布时间】:2017-06-15 15:18:40
【问题描述】:

我正在尝试使用 R 中的 rvest 包scrape this website。我已经在其他几个网站上成功完成了它,但这个似乎不起作用,我不知道为什么。

我从 chrome 的检查器工具中复制了 xpath,但是当我在 rvest 脚本中指定它时,它显示它不存在。这与表格是生成的而不是静态的有关吗?

感谢您的帮助!

library(rvest)
library (tidyverse)
library(stringr)
library(readr)

a<-read_html("http://www.diversitydatakids.org/data/profile/217/benton-county#ind=10,12,15,17,13,20,19,21,24,2,22,4,34,35,116,117,123,99,100,127,128,129,199,201")
a<-html_node(a, xpath="//*[@id='indicator10']")
a<-html_table(a)
a

【问题讨论】:

标签: r web-scraping rvest


【解决方案1】:

关于您的问题,是的,您无法得到它,因为它是动态生成的。在这些情况下,最好使用RSelenium 库:

#Loading libraries
library(rvest) # to read the html
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of the website

# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()

#Specifying the url for desired website to be scrapped
url <- "http://www.diversitydatakids.org/data/profile/217/benton-county#ind=10,12,15,17,13,20,19,21,24,2,22,4,34,35,116,117,123,99,100,127,128,129,199,201"

# go to website
remDr$navigate(url)

# get page source and save it as an html object with rvest
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()

# get the element you are looking for
a <-html_node(html_obj, xpath="//*[@id='indicator10']")

我猜你正试图获得第一张桌子。在这种情况下,也许最好直接用read_table 来获取表:

# get the table with the indicator10 id
indicator10_table <-html_node(html_obj, "#indicator10 table") %>% html_table()

这次我使用的是 CSS 选择器而不是 XPath。

希望对您有所帮助!祝你刮得愉快!

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2014-12-17
    • 2021-10-20
    • 2017-03-01
    • 2021-04-11
    • 2021-08-14
    • 1970-01-01
    • 2013-07-28
    相关资源
    最近更新 更多