【问题标题】:Column overlap in two binary R dataframe and calculate overlap/non-overlap for each column两个二进制 R 数据帧中的列重叠并计算每列的重叠/非重叠
【发布时间】:2020-06-18 19:19:19
【问题描述】:

我的两个数据框如下:

df1 <- structure(list(species = structure(1:4, .Label = c("a", "b", 
                                                          "c", "d"), class = "factor"), sample1 = c(1L, 1L, 1L, 1L), sample2 = c(0L, 
                                                                                                                                 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(species = structure(c(1L, 5L, 6L, 7L, 2L, 3L, 
                                            4L), .Label = c("a", "b", "c", "d", "x", "y", "z"), class = "factor"), 
                      sample1 = c(1L, 1L, 0L, 1L, 0L, 1L, 1L), sample2 = c(1L, 
                                                                           1L, 1L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                         -7L))

1/0 表示存在和不存在。

现在我想将 df1 的每一列与 df2 中的对应列进行匹配,并将比较结果保存在两个参数中(对于 df1 中的每一列)。

  1. TP - 每列中与对应的 df2 非零值匹配的非零 df1 值的数量和

  2. FP - 每列中与对应的 df2 非零值不匹配的非零 df1 值的数量。

输出数据帧(df3)应该是:

df3<-structure(list(species = structure(c(1L, 2L, 3L, 4L, 6L, 5L), .Label = c("a", 
                                                                         "b", "c", "d", "FP", "TP"), class = "factor"), sample1 = c(1L, 
                                                                                                                                    1L, 1L, 1L, 3L, 1L), sample2 = c(0L, 0L, 1L, 1L, 2L, 0L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                   -6L))

我尝试使用 setdiff 来获取 df1 中的差异:

overlap <- for ( i in 1:colnames(df1)){
     data.frame(setdiff(df1[,i], df2[,i]) >0)
  }

但显然这不是正确的方法。

感谢您的帮助!

【问题讨论】:

  • 嗨,你是对的,我现在换 df3

标签: r


【解决方案1】:

这样的?

i <- match(df1$species, df2$species)

TP <- colSums((df2[i, -1] == df1[-1]) & (df1[-1] == 1))
FP <- colSums((df2[i, -1] != df1[-1]) & (df1[-1] == 1))

TP <- cbind.data.frame(species = 'TP', t(TP))
FP <- cbind.data.frame(species = 'FP', t(FP))
res <- rbind(df1, TP, FP)

res
#  species sample1 sample2
#1       a       1       0
#2       b       1       0
#3       c       1       1
#4       d       1       1
#5      TP       3       2
#6      FP       1       0

【讨论】:

  • 感谢您的回答,但这不是我想要的结果。请查看更新后的 df3(输出)
猜你喜欢
  • 2019-03-02
  • 1970-01-01
  • 2023-03-18
  • 1970-01-01
  • 1970-01-01
  • 2019-02-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多