【问题标题】:How to create a new column with conditional logic, based on several values found in multiple columns?如何根据在多列中找到的多个值创建具有条件逻辑的新列?
【发布时间】:2019-04-03 15:25:33
【问题描述】:

我有一个出生缺陷数据集(测试),其中每一行都是一个案例,具有不同的 5 路缺陷组合。数据集的前五列(Defect_A、Defect_B、Defect_C、Defect_D、Defect_E)是构成此组合的缺陷编号。

我想创建一个名为“cmets”的新列,它根据以下条件逻辑输出评论:

  1. 如果一个案例/行在 1:5 列中存在以下任何缺陷(1、2、3、4),则 cmets = “conjoined”
  2. 如果一个案例在 1:5 列中存在以下任何两个缺陷(5、6、7、8),则 cmets =“脊柱裂”
  3. 如果一个案例在 1:5 列中存在以下缺陷之一 (5, 6, 7, 8) 和以下缺陷之一 (9,10,11,12,13)​​,cmets = “heterodaxy”
  4. 如果一个案例在 1:5 列中存在以下三个缺陷 (14,15,16,17,18),则 cmets = “vacterl”
       Defect_A Defect_B Defect_C Defect_D Defect_E
case1        12        3       13       17        9
case2        20       13        6        7        3
case3        11       10        4       20       12
case4        13        7        2       18        3
case5         5        2       15       11       13
case6         8        1       15       19        4
case7        11        7       19       10        1
case8         9       14       15       11       16
case9        18       10       14       16        8
case10       19        7        8       10        2

我该怎么做呢?我在下面包含了示例代码。

[编辑]

# Sample data set 
set.seed(99)
case1 = sample(1:20, 5, replace=FALSE)  
case2 = sample(1:20, 5, replace=FALSE)  
case3 = sample(1:20, 5, replace=FALSE)  
case4 = sample(1:20, 5, replace=FALSE)  
case5 = sample(1:20, 5, replace=FALSE)  
case6 = sample(1:20, 5, replace=FALSE)  
case7 = sample(1:20, 5, replace=FALSE)  
case8 = sample(1:20, 5, replace=FALSE)  
case9 = sample(1:20, 5, replace=FALSE)  
case10 = sample(1:20, 5, replace=FALSE) 
test<-data.frame(rbind(case1, case2, case3, case4, case5, case6, case7, case8, case9, case10))
colnames(test)<- c("Defect_A", "Defect_B", "Defect_C", "Defect_D", "Defect_E")
test

# Conditions
any <- c(1,2,3,4) # for condition 1  
any_2 <- c(5,6,7,8) # for conditions 2 and 3  
any_2_plus <- c(9,10,11,12,13) # for condition 3  
any_3 <- c(14,15,16,17,18) # for condition 4  

【问题讨论】:

  • 请用set.seed指定seed

标签: r function if-statement


【解决方案1】:

有了这个数据框:

# Sample data set
df = data.frame(Defect_A = sample(1:30, 10, replace=TRUE),
                Defect_B = sample(1:30, 10, replace=TRUE),
                Defect_C = sample(1:30, 10, replace=TRUE), 
                Defect_D = sample(1:30, 10, replace=TRUE),
                Defect_E = sample(1:30, 10, replace=TRUE))

# Conditions
any <- c(1,2,3,4) # for condition 1  
any_2 <- c(5,6,7,8) # for conditions 2 and 3  
any_2_plus <- c(9,10,11,12,13) # for condition 3  
any_3 <- c(14,15,16,17,18) # for condition 4  

你可以使用多个ifelse

df$comments = apply(df,1, function(x) {
   ifelse(length(x[x %in% any == TRUE]) >= 1, 'conjoined', ifelse (
     length(x[x %in% any_2 == TRUE]) >= 2, 'spina bifida', ifelse (
       length(x[x %in% any_2 == TRUE]) >= 1 && length(x[x %in% any_2_plus == TRUE]) >= 1, 'heterodaxy', ifelse (
         length(x[x %in% any_3 == TRUE]) >= 3, 'vacterl', 'NA'))))
})

必要时适应的条件

【讨论】:

    猜你喜欢
    • 2020-09-26
    • 2020-07-11
    • 2019-12-26
    • 2017-01-17
    • 2017-12-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多