如何使用 dplyr 比较多个变量答案

【问题标题】：How to compare multiple variables using dplyr如何使用 dplyr 比较多个变量
【发布时间】：2021-07-01 10:19:04
【问题描述】：

目前，我需要一种方法来分析我拥有的数据，如果您能与我合作，那将非常有帮助。数据如下例所示：

> glimpse(test)
Rows: 559
Columns: 4
$ Host.H <chr> "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Hu…
$ Host.I <chr> NA, "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermedia…
$ Host.B <chr> NA, "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", NA, "Bat", "Bat"…
$ Host.C <chr> NA, "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Co…

这些数据对应于源自蝙蝠的生物体、中间体、人类和复制体（Host.B、Host.I、Host.H 和 Host.C）。可以发现，并不是所有的单元格都是完整的，有一些不可用的数据为 N.A. 因此，我的目标是，如果所有变量中都有数据在 (Host.B = Bat, Host.I = Intermediate , Host.H = Human and Host. C = Consensus) 它被分配给一个名为“type”的新列作为“Conserved”，而如果变量之间缺少数据（Host.B = N.A, Host.I = Intermediate , Host.H = N.A and Host.C = Consensus) 它被标识为“共享”并且如果列中只有一个数据（Host.B = Bat, Host.I = N.A, Host.H = N.A and Host .C = N.A) 为“唯一”。

为此我设计了以下脚本：

test <- data %>%
  rowwise() %>%
  mutate(Type = case_when(
    all_eq(c(Host.H = Human, Host.C = Consensus, Host.B = Bat, Host.I = Intermediate), na.rm = T ~ "Conserved",
    all_neq(c(Host.H = Human, Host.C = Consensus, Host.B = Bat, Host.I = Intermediate), na.rm = T)) ~ "Unique",
    TRUE ~ "Shared"
  )) %>%
  ungroup()

不幸的是，它对我需要的目标不起作用。因此，如果您有更可行的方法来执行此操作，将不胜感激。

谢谢。

【问题讨论】：

请通过粘贴dput(data)提供您的数据

标签： r dplyr tidyverse

【解决方案1】：

您可以使用rowSums 来计算数据帧中非 NA 值的数量。基于该count，您可以分配Type 列。

library(dplyr)

test <- test %>%
  mutate(count = rowSums(!is.na(.[c('Host.H', 'Host.I', 'Host.B', 'Host.C')])), 
         Type = case_when(count == 4 ~ 'Conserved', 
                          count > 1 ~ 'Shared', 
                          count == 1 ~ 'Unique'))

您可以通过包含 %>% select(-count) 从输出中删除 count 列。

【讨论】：

您好 Ronak，如何不使用“选择”功能仅计算 Host.H、Host.B、Host.I 和 Host.C 的列？您的示例代码计算了我不想要的列中的所有 NA。
@DiegoMunoz 在glimpse 中只显示了 4 列，所以我没有专门选择列。如果有帮助，请查看更新的答案。