【问题标题】:Find the difference b/w two column elements in r找出 r 中两个列元素的 b/w 差异
【发布时间】:2021-04-19 22:28:42
【问题描述】:

如何将 diff 元素 b/w factor_Nov 和 factor_Jan 放在一个名为 diff 的新列中

 df=data.frame(id=c("1","2","3"),
                     factor_Nov=c("A|B|C","E","F|H|G"),
                     factor_Jan=c("B|H|E","E","X|Y|Z"))

输出应该是

df=data.frame(id=c("1","2","3"),
                 factor_Nov=c("A|B|C","E","F|H|G"),
                 factor_Jan=c("B|H|E","E","X|Y|Z"),
                diff=c("A|C|H|E",NA,"X|Y|Z|F|H|G"))

我试过 setdiff,但没用

【问题讨论】:

    标签: r set-difference


    【解决方案1】:

    一种选择是使用strsplit 拆分列,使用分隔符为|,然后使用Map 来获取不是intersect 的元素,paste 将它们与collapse = "|" 一起使用

    df$diff <- unlist(Map(function(x, y) paste(setdiff(union(x, y), 
       intersect(x, y)), collapse="|"),
       strsplit(as.character(df$factor_Nov), "|", fixed = TRUE),
       strsplit(as.character(df$factor_Jan), "|", fixed = TRUE)))
    

    【讨论】:

    • 谢谢,当我尝试使用 strsplit(df$factor_Nov, "|", fixed = TRUE) 中的代码错误时收到此错误:非字符参数
    • @Mel 更新了帖子。对于R 4.0,默认为stringsAsFactors = FALSE。所以我认为这是一个字符类
    • 我也可以计算差异的数量吗?
    • @Mel 是的,你可以做到。只需将函数更改为function(x, y) {diff1 &lt;- setdiff(union(x, y), intersect(x, y)); data.frame(n = length(diff1), diff = paste(diff1, collapse="|")),Map 之外,使用do.call(rbind, Map(...
    【解决方案2】:

    tidyverse:

    library(dplyr)
    library(tidyr)
    #Code
    new <- df %>% left_join(
      df %>% separate_rows(c(factor_Nov,factor_Jan)) %>%
      pivot_longer(-id) %>%
      group_by(id,value) %>%
      filter(n() == 1) %>%
      ungroup() %>% arrange(id,value) %>%
      group_by(id) %>%
      summarise(Diff=paste0(value,collapse = '|')))
    

    输出:

      id factor_Nov factor_Jan        Diff
    1  1      A|B|C      B|H|E     A|C|E|H
    2  2          E          E        <NA>
    3  3      F|H|G      X|Y|Z F|G|H|X|Y|Z
    

    【讨论】:

    • 太棒了!似乎您没有使用任何集合操作来实现它。赞成你的,并且会学习它。
    【解决方案3】:

    data.table 选项

    setDT(df)[
      ,
      diff := do.call(
        Map,
        c(
          function(...) paste0(setdiff(union(...), intersect(...)), collapse = "|"),
          unname(lapply(.SD, strsplit, split = "\\|"))
        )
      ),
      .SDcols = patterns("^factor_")
    ]
    

    给予

    > df
       id factor_Nov factor_Jan        diff
    1:  1      A|B|C      B|H|E     A|C|H|E
    2:  2          E          E
    3:  3      F|H|G      X|Y|Z F|H|G|X|Y|Z
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-08-27
      • 1970-01-01
      • 1970-01-01
      • 2018-07-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多