更有效地填充矩阵答案

【问题标题】：fill matrix more efficiently更有效地填充矩阵
【发布时间】：2015-01-24 09:17:46
【问题描述】：

我有一个 data.frame DF 如下：

u <- c(14381,  20547,  17172,  17753,  667,    17753,  914,    10802,  3346,   17753, 
667,    11113,  914,    914,    17753,  11113,  10802,  20547,  14381,  11113, 
139,    17753,  17172,  10802,  14381,  20547,  139,    14381,  17753,  10802, 
10802,  139,    11113,  10802,  11113,  3346,   11113,  11113,  11113,  10802, 
17172,  20547,  914,    17172,  3346,   139,    11113,  139,    914,    10802, 
14381,  10802,  17172,  10802,  3346,   17172,  10802,  20547,  15679,  17753, 
11113,  11113,  667,    15679,  667,    1204,   355,    1204,   400,    14351, 
16405,  12760,  16405,  12760,  11072,  1204,   14351,  265,    16405,  4993,  
400,    355,    16405,  4993,   355,    14351,  14351,  14351,  400,    11021, 
11072,  1204,   12760,  265,    12760,  265,    400,    265,    1204,   12760, 
16405,  11072,  16405,  1204,   11072,  11021,  265,    11072,  18309,  11021, 
18309,  4993,   12760,  1204,   11021,  18309,  18309,  265,    14351,  14351, 
12759,  12759,  4993,   11038,  12759,  12759,  11038,  12759,  18309,  18309, 
1,      4,      4,      3,      6,      1,      1,      2,      10,     11,    
1,      2,      1,      7,      1,      2,      1,      1,      1,      1,     
5,      1,      2,      3,      2,      2,      2,      2,      1,      1,     
5,      1,      7,      2,      1,      2,      2,      2,      2,      1,     
2,      2,      1,      4,      1,      3,      1,      1,      2,      3,     
2,      3,      1,      1,      2,      1,      1,      1,      1,      1,     
1,      2,      2,      1,      1)

DF <- as.data.frame(matrix(u, ncol = 3, nrow = 65, byrow = FALSE))

现在，我需要构造一个矩阵MAT如下：

DF 的第一列包含 MAT 的行名
DF 的第二列包含 MAT 的列名
DF 的第三列包含 MAT 的单元格值所以，MAT("14381", "1204") = 1 和 MAT("20547", "355") = 4，等等
所有其他单元格应为 0

问题是，我如何有效地从上面的数据框中构造矩阵？我目前的做法如下：

DF[, 1] <- as.character(DF[, 1])  # turn into characters
DF[, 2] <- as.character(DF[, 2])  # turn into characters
rows <- unique(DF[,1])  # get the row names
cols <- unique(DF[,2])  # get the column names
MAT <- matrix(0, nrow = length(rows), ncol = length(cols)) # prefill with 0's
dimnames(MAT) <- list(rows, cols)
for (i in 1:nrow(DF)) {
  MAT[DF[i, 1], DF[i, 2]] <- DF[i, 3]
}

这行得通，但似乎效率不高。因为我需要重复这个任务大约 10K 次，所以效率会得到回报。我怎样才能绕过循环（不断复制 MAT）并更有效地做到这一点？我在考虑 dplyr 或 data.table，但真的不知道如何使用这些包执行此操作。有人可以帮忙吗？

【问题讨论】：

您可以使用矩阵索引代替循环：MAT[as.matrix(DF[1:2])] = DF$V3。另一种可能是xtabs(V3 ~ V1 + V2, DF)
感谢@alexis_laz，确实效率更高。

标签： r data.table dplyr

【解决方案1】：

使用tidyr

library(tidyr)
spread(DF, V2, V3, fill = 0)

【讨论】：

或带有data.tabledcast.data.table(setDT(DF), V1~V2, value.var='V3', fill=0)的选项