【问题标题】:When is it advisable to use sparse matrices in R?什么时候最好在 R 中使用稀疏矩阵?
【发布时间】:2014-07-10 09:12:45
【问题描述】:

我最近一直在修改功率模拟,我有以下代码:

library(MASS)
library(Matrix)

simdat <- data.frame(mmm = rep(rep(factor(1:2,
                                          labels=c("m1", "m2")),
                                   each = 2),
                               times = 2800),
                 ttt = rep(factor(1:2,
                                  labels = c("t1", "t2")),
                           times = 5600),
                 sss = rep(factor(1:70),
                            each = 160),
                 iii = rep(rep(factor(1:40),
                               each = 4),
                           times = 70))

beta <- c(1, 2)

X1 <- model.matrix(~ mmm,
                   data = simdat)

Z1 <- model.matrix(~ ttt,
                   data = simdat)

X1Z111200x2 矩阵。在 Stackoverflow 的帮助下,我设法使我的计算比以前更有效率:

funab <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(X1 %*% beta)

    M1 <- rowSums(Z1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

Y 将是一个长度为11200 的向量。然后我多次复制这个函数(比如1000 次):

sim <- replicate(n        = 1000,
                 expr     = funab()},
                 simplify = FALSE)

sim 将是一个11200x1000 列表。鉴于我想更多地执行此操作并可能在funab() 中包含更多代码,我想知道在funab() 的计算中是否建议在funab() 中使用稀疏矩阵作为现在的@ 987654335@? /p>

【问题讨论】:

  • 你熟悉microbenchmark吗?您可以使用它来比较函数之间的性能,也就是。基准测试。简单地说,install.packages(c("microbenchmark"), dependencies = TRUE)require(microbenchmark)example(microbenchmark),你知道这个练习。我在this SO answer 中使用过microbenchmark
  • 直到现在我才知道。 :) 我今天会检查一下!
  • 如果您确实进行了测试,将其添加到您的问题中会很有趣。
  • 我会这样做的。我希望我这个周末能解决它!

标签: r matrix sparse-matrix


【解决方案1】:

好的,我已经尝试按照 cmets 中给出的建议来解决我的问题,并使用 microbenchmark 包进行了测试。为了使复制和粘贴更容易,我将重复上面的代码:

library(MASS)
library(Matrix)

simdat <- data.frame(mmm = rep(rep(factor(1:2,
                                          labels=c("m1", "m2")),
                                   each = 2),
                               times = 2800),
                 ttt = rep(factor(1:2,
                                  labels = c("t1", "t2")),
                           times = 5600),
                 sss = rep(factor(1:70),
                            each = 160),
                 iii = rep(rep(factor(1:40),
                               each = 4),
                           times = 70))

beta <- c(1, 2)

X1 <- model.matrix(~ mmm,
                   data = simdat)

Z1 <- model.matrix(~ ttt,
                   data = simdat)

我现在创建与稀疏矩阵相同的矩阵:

sparseX1 <- sparse.model.matrix(~ mmm,
                                data = simdat)

sparseZ1 <- sparse.model.matrix(~ ttt,
                                data = simdat)

然后我设置了两个函数:

funab_sparse <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(sparseX1 %*% beta)

    M1 <- Matrix::rowSums(sparseZ1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- Matrix::rowSums(sparseZ1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

funab <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(X1 %*% beta)

    M1 <- rowSums(Z1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

library(microbenchmark)
res <- microbenchmark(funab(), funab_sparse(), times = 1000)

并得到结果:

> res <- microbenchmark(funab(), funab_sparse(), times = 1000)
> res
Unit: milliseconds
           expr      min       lq   median       uq      max neval
        funab() 2.200342 2.277006 2.309587 2.481627 69.99895  1000
 funab_sparse() 8.419564 8.568157 9.666248 9.874024 75.88907  1000

假设我没有犯任何重大错误,我可以得出结论,使用稀疏矩阵进行计算的这种特殊方式不会加速我的代码。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-01-13
    • 2010-11-13
    • 2018-12-14
    • 1970-01-01
    • 2011-06-24
    • 2013-11-29
    • 2013-06-26
    相关资源
    最近更新 更多