如何在我的数据框中创建一个映射到 R 中列表的列？答案

【问题标题】：How can I create a column in my dataframe that maps to a list in R?如何在我的数据框中创建一个映射到 R 中列表的列？
【发布时间】：2019-06-06 14:04:41
【问题描述】：

我有一些许可证数据，并试图在我的数据框中创建一个列，根据某人注册的计划告诉我列出的许可证是否可接受。

为了做到这一点，我创建了一个列表，因为一些许可证可以用于多个程序。理想情况下，我的想法是，我可以以某种方式使用此列表作为参考，以查看该程序是否列在许可证名称下。我也尝试过 case_when，但一直出错。我还希望有一个可以用作一种地图的列表，因为许可证名称可能每年都会发生变化。

示例代码

以下是我的数据框的摘录：

df1 <- data.frame(Program = c("Elementary Education", "Elementary Education", "Secondary Math", "Secondary Math", "Secondary ELA", "Secondary ELA"), Licensure = c("Content Area - Elementary Education (Grades 1-6)", "Content Area - Secondary Math (Grades 7-12)", "Content Area - Secondary Math (Grades 7-12)", "Mathematics (Grades 7-12) 1706", "Content Area - Secondary ELA (Grades 7-12)", "Content Area - Early Childhood (preK-Grade 3)"))

这是我创建的列表，其中包括所有许可证以及每个许可证下的可接受程序：

license_index <- list(
  "Content Area - Early Childhood (preK-Grade 3)" = "Elementary Education",
  "Content Area - Elementary Education (Grades 1-6)" = "Elementary Education",
  "Content Area - Middle Grades ELA (Grades 4-9)" = c("Elementary Education", "Secondary ELA"),
  "Content Area - Middle Grades Math (Grades 4-9)" = c("Elementary Education", "Secondary Math"),
  "Content Area - Middle School Mathematics (Grades 4-8)" = "Elementary Education",
  "Content Area - Secondary ELA (Grades 7-12)" = "Secondary ELA",
  "Content Area - Secondary Math (Grades 7-12)" = "Secondary Math",
  "Content Area - Secondary English (Grades 7-12)" = "Secondary ELA",
  "English Language Arts and Reading (Grades 4-8) 864" = "Elementary Education",
  "Core Subjects (Grades EC-6) 1770" = "Elementary Education",
  "English Language Arts and Reading (Grades 7-12) 1709" = "Secondary ELA",
  "Mathematics (Grades 4-8) 866" = "Elementary Education",
  "Mathematics (Grades 7-12) 1706" = "Secondary Math"
)

作为最后的专栏，我最希望的是许可证和程序是否匹配：

ideal.df <- data.frame(Program = c("Elementary Education", "Elementary Education", "Secondary Math", "Secondary Math", "Secondary ELA", "Secondary ELA"), Licensure = c("Content Area - Elementary Education (Grades 1-6)", "Content Area - Secondary Math (Grades 7-12)", "Content Area - Secondary Math (Grades 7-12)", "Mathematics (Grades 7-12) 1706", "Content Area - Secondary ELA (Grades 7-12)", "Content Area - Early Childhood (preK-Grade 3)"), match = c("Match", "No", "Match", "Match", "Match", "No"))

我想我需要 mutate 函数，也许需要使用 purrr map 函数，但我对 tidyverse 不是很熟悉，非常感谢帮助！提前致谢！

【问题讨论】：

标签： r dictionary dplyr

【解决方案1】：

试试这个：

x<-stack(license_index)
x$values[match(df1$Licensure,x$ind)]==df1$Program
#[1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE

如果需要，您可以将上述的TRUE 和FALSE 值映射到Match 和No。

【讨论】：

这很好用 - 谢谢。为了更好地理解这个函数，堆栈函数是否将列表转换为数据框？
是的，确实如此。 stack 为它的每个元素重复列表的名称。

【解决方案2】：

这是tidyverse 的一种方法，我们将命名的list 转换为带有enframe、right_join 和原始数据集的两列data.frame，并通过比较列“匹配”来创建match '与'程序'

library(tidyverse)
enframe(license_index, name = "Licensure", value = "match") %>%
    unnest %>% 
    right_join(df1) %>% 
    mutate(match = match == Program) %>%
    select(names(df1), everything())
# A tibble: 6 x 3
#  Program              Licensure                                        match
#  <fct>                <chr>                                            <lgl>
#1 Elementary Education Content Area - Elementary Education (Grades 1-6) TRUE 
#2 Elementary Education Content Area - Secondary Math (Grades 7-12)      FALSE
#3 Secondary Math       Content Area - Secondary Math (Grades 7-12)      TRUE 
#4 Secondary Math       Mathematics (Grades 7-12) 1706                   TRUE 
#5 Secondary ELA        Content Area - Secondary ELA (Grades 7-12)       TRUE 
#6 Secondary ELA        Content Area - Early Childhood (preK-Grade 3)    FALSE

或者我们可以使用rap 包，这对这种情况很有帮助

library(rap)
df1 %>% 
   rap(match = ~ license_index[[as.character(Licensure)]] == Program) %>%
   unnest
#              Program                                        Licensure match
#1 Elementary Education Content Area - Elementary Education (Grades 1-6)  TRUE
#2 Elementary Education      Content Area - Secondary Math (Grades 7-12) FALSE
#3       Secondary Math      Content Area - Secondary Math (Grades 7-12)  TRUE
#4       Secondary Math                   Mathematics (Grades 7-12) 1706  TRUE
#5        Secondary ELA       Content Area - Secondary ELA (Grades 7-12)  TRUE
#6        Secondary ELA    Content Area - Early Childhood (preK-Grade 3) FALSE

【讨论】：

这两个选项都运行良好。为什么需要取消嵌套此查询的数据？
@K.C.如果您在unnest 之前检查输出的str，则match 是一个`list 列，每个都有一个元素。通过取消嵌套，我将其展平为向量列

【解决方案3】：

我真的无法在 tidyverse 方面为您提供帮助，但是这个 base-R 解决方案应该可以工作：

df1$match <-sapply(1:nrow(df1), function(i){
  ifelse(license_index[[which(names(license_index) == df1$Licensure[i])]] == df1$Program[i],'Match','No')})

【讨论】：

这很有帮助，谢谢！我在想 sapply 可能会起作用，但不确定如何，所以我很感激把它写在这里。