【问题标题】:How to add many data frame columns efficiently in R如何在 R 中有效地添加许多数据框列
【发布时间】:2021-03-11 17:26:23
【问题描述】:

我需要向数据框添加数千列。目前,我有一个包含 93 个列表的列表,其中每个嵌入列表包含 4 个数据框,每个数据框有 19 个变量。我想将所有这些数据框的每一列添加到外部文件中。我的代码如下:

  vars <- c('tmin_F','tavg_F','tmax_F','pp','etr_grass','etr_alfalfa','vpd','rhmin','rhmax','dtr_F','us','shum','pp_def_grass','pp_def_alfalfa','rw_tot','fdd28_F0','fdd32_F0','fdd35_F0',
        'fdd356_F0','fdd36_F0','fdd38_F0','fdd39_F0','fdd392_F0','fdd40_F0','fdd41_F0','fdd44_F0','fdd45_F0','fdd464_F0','fdd48_F0','fdd50_F0','fdd52_F0','fdd536_F0','fdd55_F0',
        'fdd57_F0','fdd59_F0','fdd60_F0','fdd65_F0','fdd70_F0','fdd72_F0','hdd40_F0','hdd45_F0','hdd50_F0','hdd55_F0','hdd57_F0','hdd60_F0','hdd65_F0','hdd45_F0',
        'cdd45_F0','cdd50_F0','cdd55_F0','cdd57_F0','cdd60_F0','cdd65_F0','cdd70_F0','cdd72_F0',
        'gdd32_F0','gdd35_F0','gdd356_F0','gdd38_F0','gdd39_F0','gdd392_F0','gdd40_F0','gdd41_F0','gdd44_F0','gdd45_F0',
        'gdd464_F0','gdd48_F0','gdd50_F0','gdd52_F0','gdd536_F0','gdd55_F0','gdd57_F0','gdd59_F0','gdd60_F0','gdd65_F0','gdd70_F0','gdd72_F0',
        'gddmod_32_59_F0','gddmod_32_788_F0','gddmod_356_788_F0','gddmod_392_86_F0','gddmod_41_86_F0','gddmod_464_86_F0','gddmod_48_86_F0','gddmod_50_86_F0','gddmod_536_95_F0',
        'sdd77_F0','sdd86_F0','sdd95_F0','sdd97_F0','sdd99_F0','sdd104_F0','sdd113_F0')

windows <- c(15,15,15,29,29,29,15,15,15,15,29,29,29,29,15,rep(15,78))
perc_list <- c('obs','smoothed_obs','windowed_obs','smoothed_windowed_obs')
percs <- c('00','02','05','10','20','25','30','33','40','50','60','66','70','75','80','90','95','98','100')
vcols <- seq(1,19,1)

for (v in 1:93){
 for (pl in 1:4){
  for (p in 1:19){
    normals_1981_2010 <- normals_1981_2010 %>% mutate(!!paste0(vars[v],'_daily',perc_list[pl],'_perc',percs[p]) := percents[[v]][[pl]][,vcols[p]])}}
      print(v)}

代码开始很快,但随着外部数据框大小的增长,很快就会变慢。我没有意识到这会是个问题。如何有效地添加所有这些额外的列?有没有比使用 mutate 更好的方法来做到这一点?我试过 add_column,但这不起作用。也许它不喜欢循环之类的。

【问题讨论】:

标签: r for-loop parallel-processing dplyr


【解决方案1】:

您的示例无法按原样重现(对象normals_1981_2010 不存在,但在循环中被调用,所以我不确定我是否理解您的问题。 如果我这样做了,这应该可以:

  1. 首先,我正在复制您的数据集结构,除了不是 93 列表,我将其设置为 5 个,而不是 4 个嵌套表,我将其设置为 3 个表,而不是每个表都有 19列,我将它们设置为 3 列。
df_list <- vector("list", 5) # Create an empty list vector, then fill it in.
for(i in 1:5) {
  df_list[[i]] <- vector("list", 3) 
  for(j in 1:3) {
     df_list[[i]][[j]] <- data.frame(a = 1:12,
                                     b = letters[1:12],
                                     c = month.abb[1:12])
     colnames(df_list[[i]][[j]]) <- paste0(colnames(df_list[[i]][[j]]), "_nest_", i, "subnest_", j)
  }
}
df_list # preview the structure.
  1. 然后,回答您的问题:
# Now, how to bind everything together:
df_out <- vector("list", 5)
for(i in 1:5) {
    df_out[[i]] <- bind_cols(df_list[[i]])
}

# Final step
df_out <- bind_cols(df_out)
ncol(df_out) # Here I have 5*3*3 = 45 columns, but you will have 93*4*19 = 7068 columns
# [1] 45

【讨论】:

    猜你喜欢
    • 2019-03-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-06-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-10-24
    相关资源
    最近更新 更多