【问题标题】:Writing to lists in nested for-loop R写入嵌套 for 循环 R 中的列表
【发布时间】:2015-10-02 15:47:15
【问题描述】:

我正在尝试编写一个 R 脚本,用于从网站上多个页面的表格中抓取数据。为此,我想首先创建要抓取的特定页面的列表。要抓取的页面的地址遵循格式“www.urlpart1/[year]/urlpart2/[page]”,其中 [year] 是 2003 到 2015 的范围(13 个元素),[page] 的值是 1 到 281增量为 40(8 个元素);最终,我想要的最终列表将包含 104 个元素。这是我的代码:

#specify components of URLs
url1 <- "www.urlpart1/"
url2 <- "/urlpart2/"

#specify range of years to scrape
years <- as.list(seq(from = 2003, to = 2015, by = 1)) #13 elements

#specify specific pages within each year to scrape
pages <- as.list(seq(from = 1, to = 281, by = 40)) #8 elements

#specify length of final list of URLs for scraping
loops <- as.list(seq(from = 1, to = (length(years)*length(pages)), by = 1)) #104 elements

#create empty list for storing output of for-loop
list1 <- list()

#initialize loop
for (i in loops){
  for (j in years){
    for (k in pages){
      list1[[i]] <- paste0(url1,j,url2,k)
    }
  }
}

list1 #outputs 104 elements of last iteration of loop

最终列表将包含 104 个如下所示的元素:

"www.urlpart1/2003/urlpart2/1",
"www.urlpart1/2003/urlpart2/41",
"www.urlpart1/2003/urlpart2/81",
"www.urlpart1/2003/urlpart2/121",
"www.urlpart1/2003/urlpart2/161",
"www.urlpart1/2003/urlpart2/201",
"www.urlpart1/2003/urlpart2/241",
"www.urlpart1/2003/urlpart2/281",
"www.urlpart1/2004/urlpart2/1",
"www.urlpart1/2004/urlpart2/41",
"www.urlpart1/2004/urlpart2/81",
"www.urlpart1/2004/urlpart2/121",
"www.urlpart1/2004/urlpart2/161",
"www.urlpart1/2004/urlpart2/201",
"www.urlpart1/2004/urlpart2/241",
"www.urlpart1/2004/urlpart2/281",
...
"www.urlpart1/2015/urlpart2/1",
"www.urlpart1/2015/urlpart2/41",
"www.urlpart1/2015/urlpart2/81",
"www.urlpart1/2015/urlpart2/121",
"www.urlpart1/2015/urlpart2/161",
"www.urlpart1/2015/urlpart2/201",
"www.urlpart1/2015/urlpart2/241",
"www.urlpart1/2015/urlpart2/281"

不幸的是,我得到了正确长度的列表,但所有值都是循环的最后一次迭代。先前解决类似问题的线程似乎并未解决嵌套循环中的列表写入问题。我对不依赖 for 循环的解决方案完全开放。我可以使用 Excel 的 GUI 轻松完成此操作,但我需要提高我的编码技能以使其更易于重现。谢谢!

【问题讨论】:

    标签: r for-loop


    【解决方案1】:

    我们可以使用expand.grid 创建所有变量的组合以产生data.frame 输出,然后使用paste data.frame (do.call(paste0,) 的每一行并将其转换为vector

    res <- do.call(paste0,expand.grid(url1, years, url2, pages))
    length(res)
    #[1] 104
    

    如果我们需要 for 循环,这可能会有所帮助

    v1 <- c()
    for(i in seq_along(url1)){
      for(j in seq_along(years)){
        for(k in seq_along(url2)){
          for(m in seq_along(pages)){
            v1 <- c(v1, paste0(url1[i], years[[j]], url2[k], pages[[m]]))
                             }
                           }
                         }
                 }
    identical(sort(res), sort(v1))
    #[1] TRUE
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-03-02
      • 1970-01-01
      • 1970-01-01
      • 2021-02-27
      • 2015-07-01
      • 2017-04-30
      • 2022-10-14
      相关资源
      最近更新 更多