【问题标题】:How can I create a column in my dataframe that maps to a list in R?如何在我的数据框中创建一个映射到 R 中列表的列?
【发布时间】:2019-06-06 14:04:41
【问题描述】:

我有一些许可证数据,并试图在我的数据框中创建一个列,根据某人注册的计划告诉我列出的许可证是否可接受。

为了做到这一点,我创建了一个列表,因为一些许可证可以用于多个程序。理想情况下,我的想法是,我可以以某种方式使用此列表作为参考,以查看该程序是否列在许可证名称下。我也尝试过 case_when,但一直出错。我还希望有一个可以用作一种地图的列表,因为许可证名称可能每年都会发生变化。

示例代码

以下是我的数据框的摘录:

df1 <- data.frame(Program = c("Elementary Education", "Elementary Education", "Secondary Math", "Secondary Math", "Secondary ELA", "Secondary ELA"), Licensure = c("Content Area - Elementary Education (Grades 1-6)", "Content Area - Secondary Math (Grades 7-12)", "Content Area - Secondary Math (Grades 7-12)", "Mathematics (Grades 7-12) 1706", "Content Area - Secondary ELA (Grades 7-12)", "Content Area - Early Childhood (preK-Grade 3)"))

这是我创建的列表,其中包括所有许可证以及每个许可证下的可接受程序:

license_index <- list(
  "Content Area - Early Childhood (preK-Grade 3)" = "Elementary Education",
  "Content Area - Elementary Education (Grades 1-6)" = "Elementary Education",
  "Content Area - Middle Grades ELA (Grades 4-9)" = c("Elementary Education", "Secondary ELA"),
  "Content Area - Middle Grades Math (Grades 4-9)" = c("Elementary Education", "Secondary Math"),
  "Content Area - Middle School Mathematics (Grades 4-8)" = "Elementary Education",
  "Content Area - Secondary ELA (Grades 7-12)" = "Secondary ELA",
  "Content Area - Secondary Math (Grades 7-12)" = "Secondary Math",
  "Content Area - Secondary English (Grades 7-12)" = "Secondary ELA",
  "English Language Arts and Reading (Grades 4-8) 864" = "Elementary Education",
  "Core Subjects (Grades EC-6) 1770" = "Elementary Education",
  "English Language Arts and Reading (Grades 7-12) 1709" = "Secondary ELA",
  "Mathematics (Grades 4-8) 866" = "Elementary Education",
  "Mathematics (Grades 7-12) 1706" = "Secondary Math"
)

作为最后的专栏,我最希望的是许可证和程序是否匹配:

ideal.df <- data.frame(Program = c("Elementary Education", "Elementary Education", "Secondary Math", "Secondary Math", "Secondary ELA", "Secondary ELA"), Licensure = c("Content Area - Elementary Education (Grades 1-6)", "Content Area - Secondary Math (Grades 7-12)", "Content Area - Secondary Math (Grades 7-12)", "Mathematics (Grades 7-12) 1706", "Content Area - Secondary ELA (Grades 7-12)", "Content Area - Early Childhood (preK-Grade 3)"), match = c("Match", "No", "Match", "Match", "Match", "No"))

我想我需要 mutate 函数,也许需要使用 purrr map 函数,但我对 tidyverse 不是很熟悉,非常感谢帮助!提前致谢!

【问题讨论】:

    标签: r dictionary dplyr


    【解决方案1】:

    试试这个:

    x<-stack(license_index)
    x$values[match(df1$Licensure,x$ind)]==df1$Program
    #[1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE
    

    如果需要,您可以将上述的TRUEFALSE 值映射到MatchNo

    【讨论】:

    • 这很好用 - 谢谢。为了更好地理解这个函数,堆栈函数是否将列表转换为数据框?
    • 是的,确实如此。 stack 为它的每个元素重复列表的名称。
    【解决方案2】:

    这是tidyverse 的一种方法,我们将命名的list 转换为带有enframeright_join 和原始数据集的两列data.frame,并通过比较列“匹配”来创建match '与'程序'

    library(tidyverse)
    enframe(license_index, name = "Licensure", value = "match") %>%
        unnest %>% 
        right_join(df1) %>% 
        mutate(match = match == Program) %>%
        select(names(df1), everything())
    # A tibble: 6 x 3
    #  Program              Licensure                                        match
    #  <fct>                <chr>                                            <lgl>
    #1 Elementary Education Content Area - Elementary Education (Grades 1-6) TRUE 
    #2 Elementary Education Content Area - Secondary Math (Grades 7-12)      FALSE
    #3 Secondary Math       Content Area - Secondary Math (Grades 7-12)      TRUE 
    #4 Secondary Math       Mathematics (Grades 7-12) 1706                   TRUE 
    #5 Secondary ELA        Content Area - Secondary ELA (Grades 7-12)       TRUE 
    #6 Secondary ELA        Content Area - Early Childhood (preK-Grade 3)    FALSE
    

    或者我们可以使用rap 包,这对这种情况很有帮助

    library(rap)
    df1 %>% 
       rap(match = ~ license_index[[as.character(Licensure)]] == Program) %>%
       unnest
    #              Program                                        Licensure match
    #1 Elementary Education Content Area - Elementary Education (Grades 1-6)  TRUE
    #2 Elementary Education      Content Area - Secondary Math (Grades 7-12) FALSE
    #3       Secondary Math      Content Area - Secondary Math (Grades 7-12)  TRUE
    #4       Secondary Math                   Mathematics (Grades 7-12) 1706  TRUE
    #5        Secondary ELA       Content Area - Secondary ELA (Grades 7-12)  TRUE
    #6        Secondary ELA    Content Area - Early Childhood (preK-Grade 3) FALSE
    

    【讨论】:

    • 这两个选项都运行良好。为什么需要取消嵌套此查询的数据?
    • @K.C.如果您在unnest 之前检查输出的str,则match 是一个`list 列,每个都有一个元素。通过取消嵌套,我将其展平为向量列
    【解决方案3】:

    我真的无法在 tidyverse 方面为您提供帮助,但是这个 base-R 解决方案应该可以工作:

    df1$match <-sapply(1:nrow(df1), function(i){
      ifelse(license_index[[which(names(license_index) == df1$Licensure[i])]] == df1$Program[i],'Match','No')})
    

    【讨论】:

    • 这很有帮助,谢谢!我在想 sapply 可能会起作用,但不确定如何,所以我很感激把它写在这里。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-11-26
    • 1970-01-01
    • 2023-02-23
    • 2020-08-26
    • 2021-05-30
    • 2018-02-13
    • 2022-11-18
    相关资源
    最近更新 更多