【问题标题】:Convert a column of lists into a regular column in a data.frame?将一列列表转换为data.frame中的常规列?
【发布时间】:2020-08-26 15:01:54
【问题描述】:

这是一个data.frame,其中第二列是一列列表(注意还有一个NULL)。

我们如何将每个列表转换为常规元素,以便该列与任何其他字符类列一样? (NULL 可以是NA

df <- structure(list(Year = c(2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2013L, 
2014L, 2014L, 2014L, 2014L, 2014L), Country = list(Country = "Canada", 
    Country = "Germany", Country = "France", Country = "Germany", 
    Country = "Mexico", Country = "Germany", Country = "Germany", 
    Country = "Canada", NULL, Country = "Germany", Country = "Mexico", 
    Country = "Canada", Country = "Mexico", Country = "Germany", 
    Country = "Canada", Country = "United States of America", 
    Country = "Canada", Country = "Mexico", Country = "Canada", 
    Country = "Germany")), class = "data.frame", row.names = c(NA, 
-20L))

注意

df %>% sapply(class)
     Year   Country 
"integer"    "list" 

想要的结果:

  • 数据相同,但
df %>% sapply(class)
     Year   Country 
"integer"    "character" 

【问题讨论】:

    标签: r lapply purrr sapply


    【解决方案1】:

    我会建议一种在您的 df 数据上使用函数的方法:

    myfun <- function(x)
    {
      if(is.null(x)) 
        {y <- NA} 
      else
      {
        y <- x[[1]]
      }
      return(y)
    }
    #Apply  
    df$Newvar <- as.vector(do.call(rbind,lapply(df$Country,myfun)))
    

    输出:

       Year                  Country                   Newvar
    1  2014                   Canada                   Canada
    2  2014                  Germany                  Germany
    3  2014                   France                   France
    4  2014                  Germany                  Germany
    5  2014                   Mexico                   Mexico
    6  2014                  Germany                  Germany
    7  2014                  Germany                  Germany
    8  2014                   Canada                   Canada
    9  2014                     NULL                     <NA>
    10 2014                  Germany                  Germany
    11 2014                   Mexico                   Mexico
    12 2014                   Canada                   Canada
    13 2014                   Mexico                   Mexico
    14 2014                  Germany                  Germany
    15 2013                   Canada                   Canada
    16 2014 United States of America United States of America
    17 2014                   Canada                   Canada
    18 2014                   Mexico                   Mexico
    19 2014                   Canada                   Canada
    20 2014                  Germany                  Germany
    

    还有一些检查:

    str(df)
    
    'data.frame':   20 obs. of  3 variables:
     $ Year   : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
     $ Country:List of 20
      ..$ Country: chr "Canada"
      ..$ Country: chr "Germany"
      ..$ Country: chr "France"
      ..$ Country: chr "Germany"
      ..$ Country: chr "Mexico"
      ..$ Country: chr "Germany"
      ..$ Country: chr "Germany"
      ..$ Country: chr "Canada"
      ..$        : NULL
      ..$ Country: chr "Germany"
      ..$ Country: chr "Mexico"
      ..$ Country: chr "Canada"
      ..$ Country: chr "Mexico"
      ..$ Country: chr "Germany"
      ..$ Country: chr "Canada"
      ..$ Country: chr "United States of America"
      ..$ Country: chr "Canada"
      ..$ Country: chr "Mexico"
      ..$ Country: chr "Canada"
      ..$ Country: chr "Germany"
     $ Newvar : chr  "Canada" "Germany" "France" "Germany" ...
    

    Newvar 现在不是列表。

    【讨论】:

      【解决方案2】:

      一个选项:

      df$Country <- sapply(df$Country, function(x) if (length(x)) x else NA)
      

      另一个:

      df$Country[lengths(df$Country) == 0] <- list(NA)
      df$Country <- as.vector(df$Country)
      

      【讨论】:

        【解决方案3】:

        另一种让它更符合 dplyr 的mutate 的方法。

          df2 = df %>% 
          mutate(NewCountry = if_else(
            sapply(df$Country, is.null), 
            "MISSING", 
            as.character(df$Country))
          )
        
        > sapply(df2, class)
               Year     Country  NewCountry 
          "integer"      "list" "character" 
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2013-08-18
          • 2011-05-07
          • 2020-06-21
          • 1970-01-01
          相关资源
          最近更新 更多