将一列列表转换为data.frame中的常规列？答案

【问题标题】：Convert a column of lists into a regular column in a data.frame?将一列列表转换为data.frame中的常规列？
【发布时间】：2020-08-26 15:01:54
【问题描述】：

这是一个data.frame，其中第二列是一列列表（注意还有一个NULL）。

我们如何将每个列表转换为常规元素，以便该列与任何其他字符类列一样？（NULL 可以是NA）

df <- structure(list(Year = c(2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2013L, 
2014L, 2014L, 2014L, 2014L, 2014L), Country = list(Country = "Canada", 
    Country = "Germany", Country = "France", Country = "Germany", 
    Country = "Mexico", Country = "Germany", Country = "Germany", 
    Country = "Canada", NULL, Country = "Germany", Country = "Mexico", 
    Country = "Canada", Country = "Mexico", Country = "Germany", 
    Country = "Canada", Country = "United States of America", 
    Country = "Canada", Country = "Mexico", Country = "Canada", 
    Country = "Germany")), class = "data.frame", row.names = c(NA, 
-20L))

注意

df %>% sapply(class)
     Year   Country 
"integer"    "list"

想要的结果：

数据相同，但

df %>% sapply(class)
     Year   Country 
"integer"    "character"

【问题讨论】：

标签： r lapply purrr sapply

【解决方案1】：

我会建议一种在您的 df 数据上使用函数的方法：

myfun <- function(x)
{
  if(is.null(x)) 
    {y <- NA} 
  else
  {
    y <- x[[1]]
  }
  return(y)
}
#Apply  
df$Newvar <- as.vector(do.call(rbind,lapply(df$Country,myfun)))

输出：

   Year                  Country                   Newvar
1  2014                   Canada                   Canada
2  2014                  Germany                  Germany
3  2014                   France                   France
4  2014                  Germany                  Germany
5  2014                   Mexico                   Mexico
6  2014                  Germany                  Germany
7  2014                  Germany                  Germany
8  2014                   Canada                   Canada
9  2014                     NULL                     <NA>
10 2014                  Germany                  Germany
11 2014                   Mexico                   Mexico
12 2014                   Canada                   Canada
13 2014                   Mexico                   Mexico
14 2014                  Germany                  Germany
15 2013                   Canada                   Canada
16 2014 United States of America United States of America
17 2014                   Canada                   Canada
18 2014                   Mexico                   Mexico
19 2014                   Canada                   Canada
20 2014                  Germany                  Germany

还有一些检查：

str(df)

'data.frame':   20 obs. of  3 variables:
 $ Year   : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ Country:List of 20
  ..$ Country: chr "Canada"
  ..$ Country: chr "Germany"
  ..$ Country: chr "France"
  ..$ Country: chr "Germany"
  ..$ Country: chr "Mexico"
  ..$ Country: chr "Germany"
  ..$ Country: chr "Germany"
  ..$ Country: chr "Canada"
  ..$        : NULL
  ..$ Country: chr "Germany"
  ..$ Country: chr "Mexico"
  ..$ Country: chr "Canada"
  ..$ Country: chr "Mexico"
  ..$ Country: chr "Germany"
  ..$ Country: chr "Canada"
  ..$ Country: chr "United States of America"
  ..$ Country: chr "Canada"
  ..$ Country: chr "Mexico"
  ..$ Country: chr "Canada"
  ..$ Country: chr "Germany"
 $ Newvar : chr  "Canada" "Germany" "France" "Germany" ...

Newvar 现在不是列表。

【讨论】：

【解决方案2】：

一个选项：

df$Country <- sapply(df$Country, function(x) if (length(x)) x else NA)

另一个：

df$Country[lengths(df$Country) == 0] <- list(NA)
df$Country <- as.vector(df$Country)

【讨论】：

【解决方案3】：

另一种让它更符合 dplyr 的mutate 的方法。

  df2 = df %>% 
  mutate(NewCountry = if_else(
    sapply(df$Country, is.null), 
    "MISSING", 
    as.character(df$Country))
  )

> sapply(df2, class)
       Year     Country  NewCountry 
  "integer"      "list" "character"

【讨论】：