【问题标题】:Use case_when across columns to make a new column跨列使用 case_when 创建新列
【发布时间】:2019-10-19 06:59:02
【问题描述】:

我有一个大型数据集,其中包含许多带有状态的列。我想创建一个包含参与者当前状态的新列。我正在尝试在 dplyr 中使用 case_when,但我不确定如何跨列。数据集的列太多,我无法输入每一列。以下是数据示例:

library(dplyr)
problem <- tibble(name = c("sally", "jane", "austin", "mike"),
                  status1 = c("registered", "completed", "registered", "no action"),
                  status2 = c("completed", "completed", "registered", "no action"),
                  status3 = c("completed", "completed", "withdrawn", "no action"),
                  status4 = c("withdrawn", "completed", "no action", "registered"))

对于代码,我想要一个新列来说明参与者的最终状态;但是,如果他们的状态 曾经 已完成,那么我希望它说已完成,无论他们的最终状态是什么。对于这些数据,答案如下所示:


answer <- tibble(name = c("sally", "jane", "austin", "mike"),
                 status1 = c("registered", "completed", "registered", "no action"),
                 status2 = c("completed", "completed", "registered", "no action"),
                 status3 = c("completed", "completed", "withdrawn", "no action"),
                 status4 = c("withdrawn", "completed", "no action", "registered"),
                 finalstatus = c("completed", "completed", "no action", "registered"))

另外,如果您能对您的代码进行任何解释,我将不胜感激!如果您的解决方案也可以使用 contains("status"),那将特别有用,因为在我的真实数据集中,状态列非常混乱(即 summary_status_5292019、sum_status_07012018 等)。

谢谢!

【问题讨论】:

    标签: r string dplyr conditional-statements case-when


    【解决方案1】:

    pmap 的选项

    library(tidyverse)
    problem %>%
         mutate(finalstatus =  pmap_chr(select(., starts_with('status')), ~ 
           case_when(any(c(...) == "completed")~ "completed",
                 any(c(...) == "withdrawn") ~ "no action", 
         TRUE ~ "registered")))
    

    【讨论】:

    • 谢谢你,@akrun!你能解释一下 c(...) 是什么意思,或者我可以在哪里了解更多信息吗?再次感谢您的宝贵时间!
    • @J.Sabree pmap 按行选择数据,c 用于将该行中的元素 (...) 连接到向量
    【解决方案2】:

    这是一个执行这种“行匹配”操作的函数。与 case_when 类似,您可以将checks 向量按特定顺序放置,以便在找到一个元素的匹配项时,例如'completed' 在数据中,不考虑后面元素的匹配。

    row_match <- function(data, checks, labels){
      matches <- match(unlist(data), checks)
      dim(matches) <- dim(data)
      labels[apply(matches, 1, min, na.rm = T)]
    }
    
    df %>% 
      mutate(final.stat = row_match(
                            data = select(df, starts_with('status')),
                            checks = c('completed', 'withdrawn', 'registered'),
                            labels = c('completed', 'no action', 'registered')))
    
    # # A tibble: 4 x 6
    #   name   status1    status2    status3   status4    final.stat
    #   <chr>  <chr>      <chr>      <chr>     <chr>      <chr>     
    # 1 sally  registered completed  completed withdrawn  completed 
    # 2 jane   completed  completed  completed completed  completed 
    # 3 austin registered registered withdrawn no action  no action 
    # 4 mike   no action  no action  no action registered registered
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-06-15
      • 1970-01-01
      • 1970-01-01
      • 2021-09-20
      • 2012-04-09
      • 1970-01-01
      相关资源
      最近更新 更多