【发布时间】:2021-07-24 10:35:46
【问题描述】:
我正在处理此代码,但由于某种原因,即使网站正常运行,我仍然会出现 404 错误。不知道我在哪里犯了错误,但会感谢任何社区建议。我相信我在网站链接的某个地方犯了一个错误,但我不确定要输入什么,我尝试了最低限度的“http://www.ufcstats.com/”,以及 '/fighter -详细信息/'。
library(rvest)
library(dplyr)
library(purrr)
link = "http://www.ufcstats.com/statistics/fighters?char=a&page=all"
page = read_html(link)
name = page %>% html_nodes(".b-link_style_black") %>% html_text()
name_links = page %>% html_nodes(".b-link_style_black") %>%
html_attr("href") %>% paste("http://www.ufcstats.com/fighter-details/", ., sep="") %>% trimws()
get_Info = function(name_link) {
fighter_page = read_html(name_link)
tibble(
name = fighter_page %>% html_nodes(".b-content__title-highlight") %>% html_text(),
record = fighter_page %>% html_nodes(".b-content__title-record") %>% html_text(),
height = fighter_page %>% html_nodes(".b-list__info-box_style_small-width .b-list__box-list-item_type_block:nth-child(1)") %>% html_text(),
weight = fighter_page %>% html_nodes(".b-list__info-box_style_small-width .b-list__box-list-item_type_block:nth-child(2)") %>% html_text(),
reach = fighter_page %>% html_nodes(".b-list__info-box_style_small-width .b-list__box-list-item_type_block:nth-child(3)") %>% html_text(),
stance = fighter_page %>% html_nodes(".b-list__info-box_style_small-width .b-list__box-list-item_type_block:nth-child(4)") %>% html_text(),
dob = fighter_page %>% html_nodes(".b-list__info-box_style_small-width .b-list__box-list-item_type_block:nth-child(5)") %>% html_text(),
sig_strikes_per_min= fighter_page %>% html_nodes(".b-list__info-box-left .b-list__info-box-left .b-list__box-list-item_type_block:nth-child(1)") %>% html_text(),
sig_striking_accuracy = fighter_page %>% html_nodes(".b-list__info-box-left .b-list__info-box-left .b-list__box-list-item_type_block:nth-child(2)") %>% html_text(),
sig_strikes_abs_per_min = fighter_page %>% html_nodes(".b-list__info-box-left .b-list__info-box-left .b-list__box-list-item_type_block:nth-child(3)") %>% html_text(),
sig_strike_def = fighter_page %>% html_nodes(".b-list__info-box-left .b-list__info-box-left .b-list__box-list-item_type_block:nth-child(4)") %>% html_text(),
avg_takedown = fighter_page %>% html_nodes(".b-list__info-box_style-margin-right .b-list__box-list-item_type_block:nth-child(2)") %>% html_text(),
takedown_accuracy = fighter_page %>% html_nodes(".b-list__info-box_style-margin-right .b-list__box-list-item_type_block:nth-child(3)") %>% html_text(),
takedown_defense = fighter_page %>% html_nodes(".b-list__info-box_style-margin-right .b-list__box-list-item_type_block:nth-child(4)") %>% html_text(),
sub_avg = fighter_page %>% html_nodes(".b-list__box-list_margin-top .b-list__box-list-item_type_block:nth-child(5)") %>% html_text(),
last_fight = fighter_page %>% html_nodes(".b-statistics__table-row+ .js-fight-details-click .b-fight-details__table-col~ .b-fight-details__table-col+ .l-page_align_left .b-fight-details__table-text+ .b-fight-details__table-text") %>% html_text()
) -> t
return(t)
}
df <- map_dfr(name_links, get_Info)
以下是我收到的错误代码:
Browse[1]> Q
> library(rvest)
Warning message:
In for (i in seq_along(a)) if (all(nam[i] != std.attr)) { :
closing unused connection 6 (http://www.ufcstats.com/fighter-details/http://www.ufcstats.com/fighter-details/93fe7332d16c6ad9)
...
> df <- map_dfr(name_links, get_Info)
Error in open.connection(x, "rb") : HTTP error 404.
Called from: open.connection(x, "rb")
【问题讨论】:
标签: r web-scraping