【发布时间】:2021-07-01 10:19:04
【问题描述】:
目前,我需要一种方法来分析我拥有的数据,如果您能与我合作,那将非常有帮助。数据如下例所示:
> glimpse(test)
Rows: 559
Columns: 4
$ Host.H <chr> "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Hu…
$ Host.I <chr> NA, "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermedia…
$ Host.B <chr> NA, "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", NA, "Bat", "Bat"…
$ Host.C <chr> NA, "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Co…
这些数据对应于源自蝙蝠的生物体、中间体、人类和复制体(Host.B、Host.I、Host.H 和 Host.C)。可以发现,并不是所有的单元格都是完整的,有一些不可用的数据为 N.A. 因此,我的目标是,如果所有变量中都有数据在 (Host.B = Bat, Host.I = Intermediate , Host.H = Human and Host. C = Consensus) 它被分配给一个名为“type”的新列作为“Conserved”,而如果变量之间缺少数据(Host.B = N.A, Host.I = Intermediate , Host.H = N.A and Host.C = Consensus) 它被标识为“共享”并且如果列中只有一个数据(Host.B = Bat, Host.I = N.A, Host.H = N.A and Host .C = N.A) 为“唯一”。
为此我设计了以下脚本:
test <- data %>%
rowwise() %>%
mutate(Type = case_when(
all_eq(c(Host.H = Human, Host.C = Consensus, Host.B = Bat, Host.I = Intermediate), na.rm = T ~ "Conserved",
all_neq(c(Host.H = Human, Host.C = Consensus, Host.B = Bat, Host.I = Intermediate), na.rm = T)) ~ "Unique",
TRUE ~ "Shared"
)) %>%
ungroup()
不幸的是,它对我需要的目标不起作用。因此,如果您有更可行的方法来执行此操作,将不胜感激。
谢谢。
【问题讨论】:
-
请通过粘贴
dput(data)提供您的数据