如何根据在多列中找到的多个值创建具有条件逻辑的新列？答案

【问题标题】：How to create a new column with conditional logic, based on several values found in multiple columns?如何根据在多列中找到的多个值创建具有条件逻辑的新列？
【发布时间】：2019-04-03 15:25:33
【问题描述】：

我有一个出生缺陷数据集（测试），其中每一行都是一个案例，具有不同的 5 路缺陷组合。数据集的前五列（Defect_A、Defect_B、Defect_C、Defect_D、Defect_E）是构成此组合的缺陷编号。

我想创建一个名为“cmets”的新列，它根据以下条件逻辑输出评论：

如果一个案例/行在 1:5 列中存在以下任何缺陷（1、2、3、4），则 cmets = “conjoined”
如果一个案例在 1:5 列中存在以下任何两个缺陷（5、6、7、8），则 cmets =“脊柱裂”
如果一个案例在 1:5 列中存在以下缺陷之一 (5, 6, 7, 8) 和以下缺陷之一 (9,10,11,12,13)，cmets = “heterodaxy”
如果一个案例在 1:5 列中存在以下三个缺陷 (14,15,16,17,18)，则 cmets = “vacterl”

       Defect_A Defect_B Defect_C Defect_D Defect_E
case1        12        3       13       17        9
case2        20       13        6        7        3
case3        11       10        4       20       12
case4        13        7        2       18        3
case5         5        2       15       11       13
case6         8        1       15       19        4
case7        11        7       19       10        1
case8         9       14       15       11       16
case9        18       10       14       16        8
case10       19        7        8       10        2

我该怎么做呢？我在下面包含了示例代码。

[编辑]

# Sample data set 
set.seed(99)
case1 = sample(1:20, 5, replace=FALSE)  
case2 = sample(1:20, 5, replace=FALSE)  
case3 = sample(1:20, 5, replace=FALSE)  
case4 = sample(1:20, 5, replace=FALSE)  
case5 = sample(1:20, 5, replace=FALSE)  
case6 = sample(1:20, 5, replace=FALSE)  
case7 = sample(1:20, 5, replace=FALSE)  
case8 = sample(1:20, 5, replace=FALSE)  
case9 = sample(1:20, 5, replace=FALSE)  
case10 = sample(1:20, 5, replace=FALSE) 
test<-data.frame(rbind(case1, case2, case3, case4, case5, case6, case7, case8, case9, case10))
colnames(test)<- c("Defect_A", "Defect_B", "Defect_C", "Defect_D", "Defect_E")
test

# Conditions
any <- c(1,2,3,4) # for condition 1  
any_2 <- c(5,6,7,8) # for conditions 2 and 3  
any_2_plus <- c(9,10,11,12,13) # for condition 3  
any_3 <- c(14,15,16,17,18) # for condition 4

【问题讨论】：

请用set.seed指定seed

标签： r function if-statement

【解决方案1】：

有了这个数据框：

# Sample data set
df = data.frame(Defect_A = sample(1:30, 10, replace=TRUE),
                Defect_B = sample(1:30, 10, replace=TRUE),
                Defect_C = sample(1:30, 10, replace=TRUE), 
                Defect_D = sample(1:30, 10, replace=TRUE),
                Defect_E = sample(1:30, 10, replace=TRUE))

# Conditions
any <- c(1,2,3,4) # for condition 1  
any_2 <- c(5,6,7,8) # for conditions 2 and 3  
any_2_plus <- c(9,10,11,12,13) # for condition 3  
any_3 <- c(14,15,16,17,18) # for condition 4

你可以使用多个ifelse

df$comments = apply(df,1, function(x) {
   ifelse(length(x[x %in% any == TRUE]) >= 1, 'conjoined', ifelse (
     length(x[x %in% any_2 == TRUE]) >= 2, 'spina bifida', ifelse (
       length(x[x %in% any_2 == TRUE]) >= 1 && length(x[x %in% any_2_plus == TRUE]) >= 1, 'heterodaxy', ifelse (
         length(x[x %in% any_3 == TRUE]) >= 3, 'vacterl', 'NA'))))
})

必要时适应的条件

【讨论】：