在 R 中的数据帧上迭代行比较答案

【问题标题】：Iterate row comparison over a dataframe in R在 R 中的数据帧上迭代行比较
【发布时间】：2015-03-20 14:00:45
【问题描述】：

我正在尝试比较学生测试答案的相似性。所以对于学生 A、B、C 和 D，我想比较每对可能的学生有多少次得到相同的答案。例如，A 和 B 回答了相同的 5/7 个问题，A 和 C 回答了相同的 4/7 个问题，等等。我最终会得到一个单独的列，其中的行反映了每个唯一的对。

这是一个示例数据框：

      Student Q1 Q2 Q3 Q4 Q5
      A       1  3  2  4  1
      B       1  2  4  1  1
      C       2  4  4  2  1
      D       3  1  2  3  4
      E       3  3  1  2  1

到目前为止，我已经使用 combn 设置了对：

    test<-combn(Book1$Student,2)
    compare<-lapply(1:ncol(test), function(x) rbind(Book1[Book1$Student==test[1,x], ],
                                   Book1[Book1$Student==test[2,x], ]))

这会生成一个具有唯一比较的列表，但我不知道如何对各行的相同响应求和。有什么建议吗？

【问题讨论】：

标签： r

【解决方案1】：

你可以使用组合

combn(1:nrow(Book1), 2, function(indices){
  sum(Book1[indices[1], 2 : 6] == Book1[indices[2], 2 : 6])
})

【讨论】：

当我尝试该代码时，我得到以下输出：[1] 0 1 0 1 1 0。如果它正在遍历行的组合，我希望看到的是 2 1 1 2 2 0 1 2 1. 你提供的代码中的索引函数是做什么的？

【解决方案2】：

在将Book1 从宽格式改造成长格式后，可以使用 self join 解决这个问题：

library(data.table)
long <- melt(setDT(Book1)[
  , Student := ordered(Student)], id.vars = "Student")
long[long, on = .(variable, value)][
  , .N, by = .(Student, i.Student)][
    Student < i.Student][
      order(Student, i.Student)]

   Student i.Student N
1:       A         B 2
2:       A         C 1
3:       A         D 1
4:       A         E 2
5:       B         C 2
6:       B         E 1
7:       C         E 2
8:       D         E 1

或者，可以返回一个对称矩阵，其中包含任意两个学生之间相同答案的数量

long <- melt(setDT(Book1), id.vars = "Student")
dcast(long[long, on = .(variable, value)][, .N, by = .(Student, i.Student)], 
      Student ~ i.Student, fill = 0)

   Student A B C D E
1:       A 5 2 1 1 2
2:       B 2 5 2 0 1
3:       C 1 2 5 0 2
4:       D 1 0 0 5 1
5:       E 2 1 2 1 5

【讨论】：