【问题标题】:How to separate a dataframe into a list of dataframes regarding column name in R?如何将数据框分成关于 R 中列名的数据框列表?
【发布时间】:2012-02-14 17:03:40
【问题描述】:

假设我有以下数据框:

df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))

我想创建一个数据框列表,用列名的第一部分分隔它们,即以“BR”开头的列将是列表的一个元素,以“USA”开头的列将成为另一个人,依此类推。

我可以获取列名并使用strsplit 将它们分开。但是我不确定如何迭代它并分离数据框的最佳方法。

strsplit(names(df), "\\.")

给我一​​个列表,其中顶级元素是列的名称,第二级是由 "." 拆分的相同元素。

我如何迭代此列表以获得以相同子字符串开头的列的索引号,并将这些列分组为另一个列表的元素?

【问题讨论】:

    标签: r


    【解决方案1】:

    这仅在列名始终采用您所拥有的形式(基于“.”拆分)并且您希望根据第一个“.”之前的标识符进行分组时才有效。

    df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
    USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))
    
    ## Grab the component of the names we want
    nm <- do.call(rbind, strsplit(colnames(df), "\\."))[,1]
    ## Create list with custom function using lapply
    datlist <- lapply(unique(nm), function(x){df[, nm == x]})
    

    【讨论】:

      【解决方案2】:

      Dason 打败了我,但这里是相同概念方法的不同风格:

      library(plyr)
      
      # Use regex to get the prefixes
      # Pulls any letters or digits ("\\w*") from the beginning of the string ("^")
      # to the first period ("\\.") into a group, then matches all the remaining
      # characters (".*").  Then replaces with the first group ("\\1" = "(\\w*)").
      # In other words, it matches the whole string but replaces with only the prefix.
      
      prefixes <- unique(gsub(pattern = "^(\\w*)\\..*",
                              replace = "\\1",
                              x = names(df)))
      
      # Subset to the variables that match the prefix
      # Iterates over the prefixes and subsets based on the variable names that
      # match that prefix
      llply(prefixes, .fun = function(x){
          y <- subset(df, select = names(df)[grep(names(df),
                                                  pattern = paste("^", x, sep = ""))])
      })
      

      我认为即使有“。”,这些正则表达式仍应为您提供正确的结果。稍后在变量名中:

      unique(gsub(pattern = "^(\\w*)\\..*",
                  replace = "\\1",
                  x = c(names(df), "FRA.c.blahblah")))
      

      或者如果变量名后面出现前缀:

      # Add a USA variable with "FRA" in it
      df2 <- data.frame(df, USA.FRANKLINS = rnorm(10))
      
      prefixes2 <- unique(gsub(pattern = "^(\\w*)\\..*",
                              replace = "\\1",
                              x = names(df2)))
      
      llply(prefixes2, .fun = function(x){
          y <- subset(df2, select = names(df2)[grep(names(df2),
                                                  pattern = paste("^", x, sep = ""))])
      })
      

      【讨论】:

        猜你喜欢
        • 2020-11-21
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-03-18
        • 2020-09-22
        • 1970-01-01
        • 1970-01-01
        • 2019-01-11
        相关资源
        最近更新 更多