此解决方案使用 RSelenium 以及包 XML。它还假定您有一个可以正常使用firefox 的RSelenium 安装。只需确保已将 firefox 启动脚本路径添加到您的 PATH。
如果您使用的是OS X,则需要将/Applications/Firefox.app/Contents/MacOS/ 添加到您的PATH。或者,如果您使用的是 Ubuntu 机器,则可能是 /usr/lib/firefox/。一旦你确定这是有效的,你就可以继续使用以下内容:
# Install RSelenium and XML for R
#install.packages("RSelenium")
#install.packages("XML")
# Import packages
library(RSelenium)
library(XML)
# Check and start servers for Selenium
checkForServer()
startServer()
# Use firefox as a browser and a port that's not used
remote_driver <- remoteDriver(browserName="firefox", port=4444)
remote_driver$open(silent=T)
# Use RSelenium to browse the site
epl_link <- "https://fantasy.premierleague.com/a/entry/767830/history"
remote_driver$navigate(epl_link)
elem <- remote_driver$findElement(using="class", value="ism-table")
# Get the HTML source
elemtxt <- elem$getElementAttribute("outerHTML")
# Use the XML package to work with the HTML source
elem_html <- htmlTreeParse(elemtxt, useInternalNodes = T, asText = TRUE)
# Convert the table into a dataframe
games_table <- readHTMLTable(elem_html, header = T, stringsAsFactors = FALSE)[[1]]
# Change the column names into something legible
names(games_table) <- unlist(lapply(strsplit(names(games_table), split = "\\n\\s+"), function(x) x[2]))
names(games_table) <- gsub("£", "Value", gsub("#", "CPW", gsub("Â","",names(games_table))))
# Convert the fields into numeric values
games_table <- transform(games_table, GR = as.numeric(gsub(",","",GR)),
OP = as.numeric(gsub(",","",OP)),
OR = as.numeric(gsub(",","",OR)),
Value = as.numeric(gsub("£","",Value)))
这应该产生:
GW GP PB GR TM TC OP OR Value CPW
GW1 34 1 2406373 0 0 34 2406373 100.0
GW2 47 6 2659789 0 0 81 2448674 100.0
GW3 53 9 541258 2 0 134 1914025 99.9
GW4 51 7 905524 0 0 185 1461665 100.0
GW5 66 14 379438 3 4 247 958889 100.1
GW6 66 2 303704 2 4 309 510376 99.9
GW7 65 9 138792 2 4 370 232474 99.8
GW8 63 3 108363 0 0 433 87967 100.4
GW9 48 8 1114609 2 0 481 75385 100.9
GW10 90 2 71210 0 0 571 27716 101.1
GW11 71 2 421706 3 4 638 16083 100.9
GW12 35 9 2798661 2 4 669 31820 101.2
GW13 41 8 2738535 1 0 710 53487 101.1
GW14 82 15 308725 0 0 792 29436 100.2
GW15 55 9 1048808 2 4 843 29399 100.6
GW16 49 8 1801549 0 0 892 35142 100.7
GW17 48 4 2116706 2 0 940 40857 100.7
GW18 42 2 3315031 0 0 982 78136 100.8
GW19 41 9 2600618 0 0 1023 99048 100.6
GW20 53 0 1644385 0 0 1076 113148 100.8
请注意,CPW 列(与上周相比的变化)是一个空字符串向量。
我希望这会有所帮助。