【问题标题】:R: Add a new variable to dataframes whose value is equal to the name of the dataframesR:向数据帧添加一个新变量,其值等于数据帧的名称
【发布时间】:2015-10-23 15:15:42
【问题描述】:

我想在我的全局环境中为所有数据框添加一个变量,并使新添加的列的值等于数据框名称。

Product=c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","C","C","C")
Day=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")

data1=data.frame(Product, Day)

Product2=c("Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Y","Y","Y","X","X","X")
Day2=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")

data2=data.frame(Product2, Day2)

我想在两个数据框中添加一列,其值等于数据框名称,即 data1 的 newvar="data1" 和 data2 的 newvar="data2"。我的实际数据框列表比这个长得多。

非常感谢任何帮助。

谢谢!

【问题讨论】:

    标签: r variables conditional


    【解决方案1】:

    如果 'data.frame' 对象名称是 'data' 后跟数字,我们可以使用 paste 以字符串形式获取对象名称(如果我们已经知道对象名称)

      nm1 <- paste0('data', 1:2)
    

    如果全局环境中有 100 个对象名称并且我们不知道存在多少对象,则另一种选择是将 ls 与模式参数一起使用。

      nm1 <- ls(pattern='^data\\d+')
    

    使用mget 获取list 中的值,并通过cbind 使用Map 创建一个新列('newvar')。使用Map 确保list 中的每个数据集都添加了与对象名称对应的新列。

      lst <- Map(cbind, mget(nm1), newvar= nm1)
    

    最好将其保存在list 中,因为它可以在其中执行所有操作。但是,如果需要在全局环境中更新原始对象,list2env 是一个选项(虽然不推荐)

      list2env(lst, envir=.GlobalEnv)
    

    我也可以直接读取list 中的所有文件(.csv/.txt)而不是创建单个对象。例如,我们可以通过

    读取工作目录下的所有文件
       files <- list.files()
       lst <- lapply(files, read.csv, stringsAsFactors=FALSE)
    

    参数可能需要根据分隔符进行一些更改。

    【讨论】:

      【解决方案2】:

      这是一个函数,您可以在其中传递任意数量的命名 data.frames,它会返回一个命名 data.frames 列表,并添加了请求的列。使用list2env 函数(如@akrun 的答案),您可以将它们放在您想要的任何环境中。 (您也可以修改函数以自动产生该副作用。)

      f <- function(...) {
          objnames <- as.character(substitute(c(...)))[-1]
          obj <- list(...)
          out <- mapply(function(x, col) {
              x[, col] <- col
              x
          }, obj, objnames, SIMPLIFY = FALSE)
          setNames(out, objnames)
      }
      

      使用方法如下:

      list2env(f(data1,data2), .GlobalEnv)
      # <environment: R_GlobalEnv>
      str(data1)
      # 'data.frame':   18 obs. of  3 variables:
      #  $ Product: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
      #  $ Day    : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 2 6 7 5 ...
      #  $ data1  : chr  "data1" "data1" "data1" "data1" ...
      str(data2)
      # 'data.frame':   18 obs. of  3 variables:
      #  $ Product2: Factor w/ 3 levels "X","Y","Z": 3 3 3 3 3 3 3 3 3 3 ...
      #  $ Day2    : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 2 6 7 5 ...
      #  $ data2   : chr  "data2" "data2" "data2" "data2" ...
      

      如果您想要传递大量命名对象而不在f() 中明确列出它们,您可以执行以下操作:

      list2env(do.call(f, sapply(ls(pattern = "data"), as.name)), .GlobalEnv)
      

      会有同样的结果。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-03-31
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-03-09
        • 1970-01-01
        相关资源
        最近更新 更多