【问题标题】:Convert nested for loop to a function将嵌套的 for 循环转换为函数
【发布时间】:2021-09-16 04:11:34
【问题描述】:

我写了一个嵌套的 for 循环,想把它转换成一个函数。这是我目前拥有的代码:

## Sleeper function
testit <- function(x)
{
  p1 <- proc.time()
  Sys.sleep(x)
  proc.time() - p1 # The cpu usage should be negligible
}

## Set up null objects to fill in loop
episodes = NULL
l = NULL

## Scrape pages and append together
for (season in c(27:38)){
  for (episode in c(1:13)){
    tryCatch({
      if (season == 34 & episode == 3){
        l = 'l'
      }
      new_episode <- read_html(paste0('http://www.chakoteya.net/DoctorWho/', season, '-', episode, '.htm', l)) %>% 
      html_nodes("p") %>%
      html_text() %>% 
      tibble(value = .)
      episodes <- episodes %>% bind_rows(new_episode)
      cat(paste('\rseason = ', season, '; episode = ', episode))
      testit(2)
    }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
  }
}

如您所见,它在第 27-38 季和第 1-13 季中循环播放。有时这一季只有 12 集,因此tryCatch()。而且,由于我不明白的原因,有时 URL 需要一个 .html 后缀(如果季节 >= 34 和情节 >= 3),有时它需要一个 .htm 后缀,因此是 if (season == 34 &amp; episode == 3) 声明。

我想将其转换为函数,可能使用apply()map(),但我的函数技能仍然很初级,我正在苦苦挣扎。

作为最终输出,最好像这样调用一个名为 doctor_who() 的函数:

episodes <- doctor_who(season = c(27:38), episode = c(1:13))

【问题讨论】:

    标签: r function for-loop web-scraping rvest


    【解决方案1】:

    这样的?

    library(dplyr)
    library(xml2)
    
    testit <- function(x) {
      p1 <- proc.time()
      Sys.sleep(x)
      proc.time() - p1 # The cpu usage should be negligible
    }
    
    doctor_who <- function(season = 27:38, episode = 1:13){
      ## Set up null objects to fill in loop
      episodes = NULL
      l = NULL
      
      res <- vector("list", length = length(season)* length(episodes))
      inx <- 0L
      ## Scrape pages and append together
      for (s in season){
        for (e in episode){
          tryCatch({
            if (s == 34 && e == 3){
              l = 'l'
            }
            URL <- paste0('http://www.chakoteya.net/DoctorWho/', s, '-', e, '.htm', l)
            new_episode <- read_html(URL, encoding = "shift_jis") %>% 
              html_nodes("p") %>%
              html_text() %>% 
              tibble(value = .)
            episodes <- episodes %>% bind_rows(new_episode)
            message(paste('season = ', s, '; episode = ', e))
            testit(2)
            inx <- inx + 1L
            res[[inx]] <- episodes
          }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
        }
      }
      res
    }
    
    episodes <- doctor_who(season = 27:38, episode = 1:13)
    
    # how many were read
    length(episodes)
    #[1] 137
    
    # how many were expected
    length(27:38)*length(1:13)
    #[1] 156
    

    【讨论】:

      【解决方案2】:

      我会把它分成两个函数-

      第一个函数从一个文件中读取数据,第二个函数创建每个季节和剧集的所有 url,并使用map_df 逐个传递它们。

      在第一个函数中,我们首先尝试从.htm 读取数据,如果失败则从.html 读取数据。

      read_text <- function(url) {
        val <- tryCatch({
          read_html(url) %>%
            html_nodes("p") %>%
            html_text() %>% 
            tibble(value = .)
        }, error = function(e) return(NA))
        
        if(NROW(val) == 1) {
          url <- paste0(url, 'l')
          val <- tryCatch({
            read_html(url) %>%
              html_nodes("p") %>%
              html_text() %>% 
              tibble(value = .)
          }, error = function(e) return(NA))
        }
      
        if(NROW(val) > 1) val else NULL
      }
      
      
      doctor_who <- function(season, episode) {
        all_urls <- sprintf('http://www.chakoteya.net/DoctorWho/%s.htm', 
                            c(t(outer(season, episode, paste, sep = '-'))))
        purrr::map_df(all_urls, read_text, .id = 'id')
      }
      
      res <- doctor_who(season = 1:5, episode = 1:5)
      

      我添加了一个id 列来区分每一集。

      #   id    value                                                                      
      #   <chr> <chr>                                                                      
      # 1 1     " \r\nOriginal Airdate: 23 Nov, 1963"                                      
      # 2 1     " [Coal Hill School corridor] "                                            
      # 3 1     " (The bell is ringing for end of classes.)\r\nGIRL: Night, Miss Wright. \…
      # 4 1     " [Laboratory] "                                                           
      # 5 1     " (A man is tidying up after the class) \r\nIAN: Oh? Not gone yet? \r\nBAR…
      # 6 1     " [Classroom] "                                                            
      # 7 1     " (Susan is listening to guitar rock music on her\r\ntransistor radio. I'm…
      # 8 1     " [Totter's Lane] "                                                        
      # 9 1     " (Ian and Barbara are parked up.) \r\nBARBARA: Over there. \r\nIAN: We're…
      #10 1     " [Memory - classroom] "                                                   
      # … with 4,446 more rows
      

      【讨论】:

        猜你喜欢
        • 2020-11-26
        • 1970-01-01
        • 2012-11-06
        • 1970-01-01
        • 1970-01-01
        • 2017-08-08
        • 2020-05-20
        • 2013-01-04
        • 2019-11-10
        相关资源
        最近更新 更多