【问题标题】:R stops scraping when there is missing data缺少数据时 R 停止抓取
【发布时间】:2020-09-23 04:01:25
【问题描述】:

我正在使用此代码循环访问多个 url 以抓取数据。该代码可以正常工作,直到遇到缺少数据的日期。这是弹出的错误信息:

data.frame(away, home, away1H, home1H, awayPinnacle, homePinnacle) 中的错误: 参数意味着不同的行数:7、8

我对编码非常陌生,尽管缺少数据,但我不知道如何让它继续抓取。

    library(rvest)
    library(dplyr)

    get_data <- function(date) {

      # Specifying URL
      url <- paste0('https://classic.sportsbookreview.com/betting-odds/nba-basketball/money-line/1st-half/?date=', date)

      # Reading the HTML code from website
      oddspage <- read_html(url)

      # Using CSS selectors to scrape away teams
      awayHtml <- html_nodes(oddspage,'.eventLine-value:nth-child(1) a')

      #Using CSS selectors to scrape 1Q scores
      away1QHtml <- html_nodes(oddspage,'.current-score+ .first')
      away1Q <- html_text(away1QHtml)
      away1Q <- as.numeric(away1Q)
      home1QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .current-score+ .period')
      home1Q <- html_text(home1QHtml)
      home1Q <- as.numeric(home1Q)

      #Using CSS selectors to scrape 2Q scores
      away2QHtml <- html_nodes(oddspage,'.first:nth-child(3)')
      away2Q <- html_text(away2QHtml)
      away2Q <- as.numeric(away2Q)
      home2QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .period:nth-child(3)')
      home2Q <- html_text(home2QHtml)
      home2Q <- as.numeric(home2Q)

      #Creating First Half Scores
      away1H <- away1Q + away2Q
      home1H <- home1Q + home2Q

      #Using CSS selectors to scrape scores
      awayScoreHtml <- html_nodes(oddspage,'.first.total')
      awayScore <- html_text(awayScoreHtml)
      awayScore <- as.numeric(awayScore)
      homeScoreHtml <- html_nodes(oddspage, '.score-periods+ .score-periods .total')
      homeScore <- html_text(homeScoreHtml)
      homeScore <- as.numeric(homeScore)

      # Converting away data to text
      away <- html_text(awayHtml)

      # Using CSS selectors to scrape home teams
      homeHtml <- html_nodes(oddspage,'.eventLine-value+ .eventLine-value a')

      # Converting home data to text
      home <- html_text(homeHtml)

      # Using CSS selectors to scrape Away Odds
      awayPinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value:nth-child(1) b')

      # Converting Away Odds to Text
      awayPinnacle <- html_text(awayPinnacleHtml)

      # Converting Away Odds to numeric
      awayPinnacle <- as.numeric(awayPinnacle)

      # Using CSS selectors to scrape Pinnacle Home Odds
      homePinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value+ .eventLine-book-value b')

      # Converting Home Odds to Text
      homePinnacle <- html_text(homePinnacleHtml)

      # Converting Home Odds to Numeric
      homePinnacle <- as.numeric(homePinnacle)

      # Create Data Frame
      df <- data.frame(away,home,away1H,home1H,awayPinnacle,homePinnacle)

    }

    date_vec <- sprintf('201902%02d', 02:06)

    all_data <- do.call(rbind, lapply(date_vec, get_data))

    View(all_data)

【问题讨论】:

  • 哪个日期有数据缺失?

标签: r web-scraping


【解决方案1】:

我推荐purrr::map() 而不是lapply。然后你可以用possibly() 包裹你对get_data() 的调用,这是捕捉错误并继续前进的好方法。

library(purrr)

map_dfr(date_vec, possibly(get_data, otherwise = data.frame()))

输出:

            away         home away1H home1H awayPinnacle homePinnacle
1  L.A. Clippers      Detroit     47     65          116         -131
2      Milwaukee   Washington     73     50         -181          159
3        Chicago    Charlotte     60     51          192         -220
4       Brooklyn      Orlando     48     44          121         -137
5        Indiana        Miami     53     54          117         -133
6         Dallas    Cleveland     58     55         -159          140
7    L.A. Lakers Golden State     58     63          513         -651
8    New Orleans  San Antonio     50     63          298         -352
9         Denver    Minnesota     61     64          107         -121
10       Houston         Utah     63     50          186         -213
11       Atlanta      Phoenix     58     57          110         -125
12  Philadelphia   Sacramento     52     62         -139          123
13       Memphis     New York     42     41         -129          114
14 Oklahoma City       Boston     58     66          137         -156
15 L.A. Clippers      Toronto     51     65          228         -263
16       Atlanta   Washington     61     57          172         -196
17        Denver      Detroit     55     68         -112         -101
18     Milwaukee     Brooklyn     51     42         -211          184
19       Indiana  New Orleans     53     50         -143          127
20       Houston      Phoenix     63     57         -256          222
21   San Antonio   Sacramento     59     63         -124          110

【讨论】:

  • 非常感谢,这正是我想要的
猜你喜欢
  • 1970-01-01
  • 2018-08-25
  • 1970-01-01
  • 1970-01-01
  • 2018-01-13
  • 1970-01-01
  • 2015-09-08
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多