【问题标题】:Create new column based on 2 reference string columns基于 2 个引用字符串列创建新列
【发布时间】:2021-01-29 00:19:53
【问题描述】:

问题

我有 2 个数据框,一个参考数据框 ref_df 和一个测试数据框 test_df。参考数据框由 2 列(字符串)组成:reference_Areference_B,我想在我的 test_df 数据框中为其创建一个新列,以说明如果两个字符串列 test_A 和 @ 987654327@ 匹配 reference_Areference_B,然后“通过”,否则“失败”。


示例数据

参考数据框
ref_df <- data.frame(
  reference_A = c("ABC","HIJ","NOP","TUV"),
  reference_B = c("DEF","KLM","QRS","WXY")
)

ref_df

  reference_A reference_B
1         ABC         DEF
2         HIJ         KLM
3         NOP         QRS
4         TUV         WXY
test_df 数据框
test_df <- data.frame(
  sample = c(1,2,3,4,5,6),
  test_A = c("ABC","HII","NOP","TUV","TUS","KJF"),
  test_B = c("DEF","KLM","QRR","WXY","WXZ", "KLM")
)

test_df

  sample test_A test_B
1      1    ABC    DEF
2      2    HII    KLM
3      3    NOP    QRR
4      4    TUV    WXY
5      5    TUS    WXZ
6      6    KJF    KLM

所需的解决方案

test_qc

  sample test_A test_B status
1      1    ABC    DEF Pass
2      2    HII    KLM Fail
3      3    NOP    QRR Fail
4      4    TUV    WXY Pass
5      5    TUS    WXZ Fail
6      6    KJF    KLM Fail

尝试失败

test_qc <- test_df %>% 
  select(test_A, test_B) %>% 
  mutate(status = 
           ifelse(test_A == ref_df$reference_A & test_B == ref_df$reference_B, 
                  "Pass", "Fail"))
Warning messages:
1: Problem with `mutate()` input `status`.
ℹ longer object length is not a multiple of shorter object length
ℹ Input `status` is `ifelse(...)`. 
2: In test_A == reference$reference_A :
  longer object length is not a multiple of shorter object length
3: Problem with `mutate()` input `status`.
ℹ longer object length is not a multiple of shorter object length
ℹ Input `status` is `ifelse(...)`. 
4: In test_B == reference$reference_B :
  longer object length is not a multiple of shorter object length

【问题讨论】:

    标签: r string if-statement dplyr


    【解决方案1】:

    你可以试试这个:

    library(dplyr)
    
    ref_df$temp <- 1
     
    test_df %>% left_join(ref_df, by =c("test_A" = "reference_A", "test_B" = "reference_B"))%>% mutate(status = if_else(is.na(temp), "Fail", "Pass")) %>% select(-temp)
    
      sample test_A test_B status
    1      1    ABC    DEF   Pass
    2      2    HII    KLM   Fail
    3      3    NOP    QRR   Fail
    4      4    TUV    WXY   Pass
    5      5    TUS    WXZ   Fail
    6      6    KJF    KLM   Fail
    
    

    【讨论】:

      【解决方案2】:

      您可以将paste 的键放在一起查看是否匹配并相应地分配'Pass''Fail'

      transform(test_df, status = ifelse(paste(test_A, test_B) %in% 
                       paste(ref_df$reference_A, ref_df$reference_B), 'Pass', 'Fail'))
      
      #  sample test_A test_B status
      #1      1    ABC    DEF   Pass
      #2      2    HII    KLM   Fail
      #3      3    NOP    QRR   Fail
      #4      4    TUV    WXY   Pass
      #5      5    TUS    WXZ   Fail
      #6      6    KJF    KLM   Fail
      

      也可以写成dplyr

      library(dplyr)
      test_df %>%
        mutate(status = if_else(paste(test_A, test_B) %in% 
                   paste(ref_df$reference_A, ref_df$reference_B), 'Pass', 'Fail'))
      

      【讨论】:

      • 永远优雅@Ronak Shah!
      猜你喜欢
      • 2019-06-05
      • 1970-01-01
      • 1970-01-01
      • 2018-04-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-10-07
      • 1970-01-01
      相关资源
      最近更新 更多