【问题标题】:Create indicator column based on presence of 0/1 in all other columns根据所有其他列中是否存在 0/1 创建指标列
【发布时间】:2021-08-18 15:53:14
【问题描述】:

我经常发现自己必须应用以下条件:我有一个包含多个二进制列的表,评级为 yes/no 或 0/1。我必须使用以下规则在计算中创建一个新的中间列:如果所有列都是“否”,那么新列是“否”,如果至少有一列有“是”,那么汇总列必须表示“是的”。 我通常使用 case_when 执行此操作,并且效果很好(参见示例)。

library(tidyverse)

#create a table for reproducible example
set.seed(001)
carac1 <- round(runif(100),0)
carac2 <- round(runif(100),0)
carac3 <- round(runif(100),0)
data <- data.frame(carac1,carac2,carac3)

#apply case_when with complex condition
data <- data %>%
  mutate(carac_all = case_when(
    carac1 == 0 & carac2 == 0 & carac3 == 0 ~ "Always no",
    carac1 == 1 | carac2 == 1 | carac3 == 1 ~ "yes at least one time",
    TRUE ~ NA_character_))

这正是我想要的:

   carac1 carac2 carac3             carac_all
1        0      1      0 yes at least one time
2        0      0      0             Always no
3        1      0      1 yes at least one time
4        1      1      0 yes at least one time

(此示例使用数字 0/1,但有时使用字符是/否或其他类别变量,例如颜色......因此在这些情况下使用 >0 的技巧并不容易实现。)

问题是这段代码强制我在代码中输入每一列的名称。在我的最后一个文件中,我有 120 个连续的列要分析......有没有办法在一系列列上使用这种情况下的 case_when? 我尝试了carac1:carac3 == 0,但它不起作用,无论如何我不明白如何表达“至少一列说是”。

感谢您的帮助。

TLDR:我想简化我当前使用的代码,这样我就不必在代码中输入每个变量的名称,而是输入一系列变量。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    3。更新: 使用dplyr 1.0.4: if_any() and if_all()

    data %>% 
        mutate(
            carac_all = case_when(
                if_all(contains("carac"), ~. < 1) ~ "Always no",
                if_any(contains("carac"), ~. >= 1) ~ "yes at least one time",
                TRUE ~ NA_character_))
    

    2。更新: 感谢 Martin Gal 的宝贵意见:

    data %>% 
        mutate(carac_all = case_when(
            rowSums(across(carac1:carac3)) < 1 ~ "Always no",
            rowSums(across(carac1:carac3)) >=1 ~ "yes at least one time",
            TRUE ~ NA_character_))
    

    更新:澄清后:

    data %>% 
        mutate(sum_carac = carac1+carac2+carac3) %>% 
        mutate(carac_all = case_when(
            sum_carac < 1 ~ "Always no",
            sum_carac >=1 ~ "yes at least one time",
            TRUE ~ NA_character_)) %>% 
        select(-sum_carac)
    
      carac1 carac2 carac3             carac_all
    1        0      1      0 yes at least one time
    2        0      0      0             Always no
    3        1      0      1 yes at least one time
    4        1      1      0 yes at least one time
    5        0      1      0 yes at least one time
    6        1      0      1 yes at least one time
    7        1      0      1 yes at least one time
    8        1      0      0 yes at least one time
    9        1      1      0 yes at least one time
    10       0      1      1 yes at least one time
    11       0      1      1 yes at least one time
    12       0      1      0 yes at least one time
    13       1      0      1 yes at least one time
    14       0      0      1 yes at least one time
    15       1      0      1 yes at least one time
    16       0      0      0             Always no
    17       1      1      1 yes at least one time
    18       1      0      1 yes at least one time
    19       0      0      1 yes at least one time
    20       1      1      0 yes at least one time
    21       1      1      0 yes at least one time
    22       0      0      0             Always no
    23       1      0      0 yes at least one time
    24       0      0      1 yes at least one time
    25       0      1      1 yes at least one time
    26       0      0      1 yes at least one time
    27       0      1      0 yes at least one time
    28       0      0      0             Always no
    29       1      0      0 yes at least one time
    30       0      1      1 yes at least one time
    31       0      1      0 yes at least one time
    32       1      0      0 yes at least one time
    33       0      0      0             Always no
    34       0      1      1 yes at least one time
    35       1      1      0 yes at least one time
    36       1      1      1 yes at least one time
    37       1      1      1 yes at least one time
    38       0      1      1 yes at least one time
    39       1      1      0 yes at least one time
    40       0      1      0 yes at least one time
    41       1      1      0 yes at least one time
    42       1      1      1 yes at least one time
    43       1      0      1 yes at least one time
    44       1      0      0 yes at least one time
    45       1      1      0 yes at least one time
    46       1      0      0 yes at least one time
    47       0      0      0             Always no
    48       0      1      0 yes at least one time
    49       1      0      0 yes at least one time
    50       1      1      1 yes at least one time
    51       0      1      1 yes at least one time
    52       1      1      1 yes at least one time
    53       0      0      0             Always no
    54       0      0      1 yes at least one time
    55       0      1      0 yes at least one time
    56       0      0      0             Always no
    57       0      1      0 yes at least one time
    58       1      0      0 yes at least one time
    59       1      0      0 yes at least one time
    60       0      0      1 yes at least one time
    61       1      0      1 yes at least one time
    62       0      1      0 yes at least one time
    63       0      0      0             Always no
    64       0      1      1 yes at least one time
    65       1      1      1 yes at least one time
    66       0      0      0             Always no
    67       0      0      0             Always no
    68       1      0      1 yes at least one time
    69       0      1      1 yes at least one time
    70       1      0      0 yes at least one time
    71       0      1      0 yes at least one time
    72       1      1      0 yes at least one time
    73       0      1      0 yes at least one time
    74       0      0      0             Always no
    75       0      0      0             Always no
    76       1      1      0 yes at least one time
    77       1      1      0 yes at least one time
    78       0      1      0 yes at least one time
    79       1      1      0 yes at least one time
    80       1      1      1 yes at least one time
    81       0      0      0             Always no
    82       1      0      1 yes at least one time
    83       0      1      1 yes at least one time
    84       0      1      0 yes at least one time
    85       1      1      0 yes at least one time
    86       0      0      0             Always no
    87       1      1      0 yes at least one time
    88       0      1      0 yes at least one time
    89       0      1      0 yes at least one time
    90       0      1      0 yes at least one time
    91       0      1      0 yes at least one time
    92       0      0      0             Always no
    93       1      0      1 yes at least one time
    94       1      1      0 yes at least one time
    95       1      0      1 yes at least one time
    96       1      1      1 yes at least one time
    97       0      0      0             Always no
    98       0      1      0 yes at least one time
    99       1      0      0 yes at least one time
    100      1      1      1 yes at least one time
    

    第一个答案: 我们可以使用来自dplyr 包的across

    library(dplyr)
    data %>% 
        mutate(across(starts_with("carac"), ~case_when(
            . == 0 ~ "Always no",
            . == 1 ~ "yes at least one time",
            TRUE ~ NA_character_), .names ="x_{.col}")) %>% 
            select(carac1:x_carac1)
    
     carac1 carac2 carac3             carac_all
    1        0      1      0 yes at least one time
    2        0      0      0             Always no
    3        1      0      1 yes at least one time
    4        1      1      0 yes at least one time
    5        0      1      0 yes at least one time
    6        1      0      1 yes at least one time
    7        1      0      1 yes at least one time
    8        1      0      0 yes at least one time
    9        1      1      0 yes at least one time
    10       0      1      1 yes at least one time
    11       0      1      1 yes at least one time
    12       0      1      0 yes at least one time
    13       1      0      1 yes at least one time
    14       0      0      1 yes at least one time
    15       1      0      1 yes at least one time
    16       0      0      0             Always no
    17       1      1      1 yes at least one time
    18       1      0      1 yes at least one time
    19       0      0      1 yes at least one time
    20       1      1      0 yes at least one time
    21       1      1      0 yes at least one time
    22       0      0      0             Always no
    23       1      0      0 yes at least one time
    24       0      0      1 yes at least one time
    25       0      1      1 yes at least one time
    26       0      0      1 yes at least one time
    27       0      1      0 yes at least one time
    28       0      0      0             Always no
    29       1      0      0 yes at least one time
    30       0      1      1 yes at least one time
    31       0      1      0 yes at least one time
    32       1      0      0 yes at least one time
    33       0      0      0             Always no
    34       0      1      1 yes at least one time
    35       1      1      0 yes at least one time
    36       1      1      1 yes at least one time
    37       1      1      1 yes at least one time
    38       0      1      1 yes at least one time
    39       1      1      0 yes at least one time
    40       0      1      0 yes at least one time
    41       1      1      0 yes at least one time
    42       1      1      1 yes at least one time
    43       1      0      1 yes at least one time
    44       1      0      0 yes at least one time
    45       1      1      0 yes at least one time
    46       1      0      0 yes at least one time
    47       0      0      0             Always no
    48       0      1      0 yes at least one time
    49       1      0      0 yes at least one time
    50       1      1      1 yes at least one time
    51       0      1      1 yes at least one time
    52       1      1      1 yes at least one time
    53       0      0      0             Always no
    54       0      0      1 yes at least one time
    55       0      1      0 yes at least one time
    56       0      0      0             Always no
    57       0      1      0 yes at least one time
    58       1      0      0 yes at least one time
    59       1      0      0 yes at least one time
    60       0      0      1 yes at least one time
    61       1      0      1 yes at least one time
    .........
    

    【讨论】:

    • 大师case_when:D
    • 对于map,我的大脑很复杂,因此我尝试across :-)。
    • 对你来说没有什么是复杂的:)
    • 我不明白。您不会在代码中创建名为 carac_all 的列。您创建了三个名为“results_from_carac1”的列,直到 carac3。可见的“carac_all”是在我的示例中创建的,您可以通过选择刚刚创建的列来隐藏。直接用你的代码不行
    • 您可以删除代码的第二行和最后一行,并将case_when 中的sum_carac 替换为rowSums(across(carac1:carac3))
    【解决方案2】:

    使用rowSums的简单方法:

    data$carac_all = ifelse(rowSums(data) > 0, "yes at least one time", "Always no")
    
       carac1 carac2 carac3             carac_all
    1       0      1      0 yes at least one time
    2       0      0      0             Always no
    3       1      0      1 yes at least one time
    4       1      1      0 yes at least one time
    5       0      1      0 yes at least one time
    6       1      0      1 yes at least one time
    7       1      0      1 yes at least one time
    8       1      0      0 yes at least one time
    9       1      1      0 yes at least one time
    10      0      1      1 yes at least one time
    11      0      1      1 yes at least one time
    12      0      1      0 yes at least one time
    13      1      0      1 yes at least one time
    14      0      0      1 yes at least one time
    15      1      0      1 yes at least one time
    16      0      0      0             Always no
    17      1      1      1 yes at least one time
    18      1      0      1 yes at least one time
    19      0      0      1 yes at least one time
    20      1      1      0 yes at least one time
    21      1      1      0 yes at least one time
    22      0      0      0             Always no
    23      1      0      0 yes at least one time
    24      0      0      1 yes at least one time
    25      0      1      1 yes at least one time
    

    【讨论】:

    • 它可以工作(在 pivot_wider 的末尾没有“dr”)。在这种情况下,我很难真正理解 pivot_longer 中的转换,但在这个例子中它是有效的 ;) 是否需要使用 pivot ?
    【解决方案3】:

    您可以使用lookup 向量,其中1:3carac 的列索引。

    lookup <- c("TRUE" = "yes at least one time", "FALSE" = "Always no")
    data$carac_all <- lookup[paste0(apply(data[, 1:3], 1, sum) > 0)]
    

    【讨论】:

      【解决方案4】:

      所有建议的解决方案都有效,但它们涉及求和计算:如果 sum >= 1,则 1 至少出现一次。 我终于找到了我想象中的解决方案,即也可以使用分类变量。 必须经过 if_any 和 if_all :

      library(tidyverse)
      set.seed(001)
      #create a table for reproducible example
      carac1 <- round(runif(100),0)
      carac2 <- round(runif(100),0)
      carac3 <- round(runif(100),0)
      data <- data.frame(carac1,carac2,carac3)
      
      data <- data %>%
        mutate(carac_all = case_when(
          if_any(carac1:carac3, ~.x == "1") == T ~ "yes at least one time",
          if_all(carac1:carac3, ~.x == "0") == T ~ "always no",
          TRUE ~ NA_character_))
      

      非常感谢大家

      【讨论】:

        猜你喜欢
        • 2021-10-05
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2023-01-23
        • 2023-01-24
        • 1970-01-01
        • 2017-01-03
        相关资源
        最近更新 更多