来自大稀疏矩阵的 Rcpp submat答案

【问题标题】：Rcpp submat from a big sparse matrix来自大稀疏矩阵的 Rcpp submat
【发布时间】：2019-11-03 01:59:27
【问题描述】：

我正在尝试将 vec 乘以一个非常大的稀疏矩阵的子集（如后面的脚本），但在使用 sourceCpp 时它无法编译，它报告 error: no matching function for call to ‘arma::SpMat<double>::submat(arma::uvec&, arma::uvec&)，如果有人会非常感激可以帮我一个忙。

#include <RcppArmadillo.h>

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
double myf(sp_mat X, vec g, uvec xi){
    double u = g(xi).t() * X.submat(xi, xi) * g(xi);
    return u;
}

【问题讨论】：

根据documentation 稀疏矩阵只支持连续子矩阵视图。
非常感谢，还有其他方法可以实现吗？
如何将g 中未在xi 中提及的所有元素设置为零？还是通过两个循环“手工”制作产品？
感谢您的建议。如果没有现成的功能，也许两个循环是最好的选择。

标签： r sparse-matrix rcpp armadillo rcpparmadillo

【解决方案1】：

所以，正如@RalfStubner 提到的，matrix access for sparse matrices is continuous only。也就是说，所采用的访问方法对于实际的稀疏矩阵是对称的，因为使用了相同的索引。因此，在这种情况下，恢复到 (x,y) 的标准元素访问器是有意义的。因此，求和减少可以用一个循环来完成。

#include <RcppArmadillo.h>

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
double submat_multiply(const arma::sp_mat& X, 
                       const arma::vec& g, const arma::uvec& xi){

  // Add an assertion
  if(X.n_rows != g.n_elem) {
    Rcpp::stop("Mismatched row and column dimensions of X (%s) and g (%s).",
               X.n_rows, g.n_elem);
  }

  // Reduction
  double summed = 0; 

  for (unsigned int i = 0; i < xi.n_elem; ++i) {
    // Retrieve indexing element
    arma::uword index_at_i = xi(i);
    // Add components together
    summed += g(index_at_i) * X(index_at_i, index_at_i) * g(index_at_i);
  }

  // Return result
  return summed;
}

另一种方法，但可能成本更高，是提取稀疏矩阵的对角线并将其转换为密集向量。从那里应用逐元素乘法和求和。

#include <RcppArmadillo.h>

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
double submat_multiply_v2(const arma::sp_mat& X, 
                          const arma::vec& g, const arma::uvec& xi){

  // Add an assertion
  if(X.n_rows != g.n_elem) {
    Rcpp::stop("Mismatched row and column dimensions of X (%s) and g (%s).",
               X.n_rows, g.n_elem);
  }

  // Copy sparse diagonal to dense vector
  arma::vec x_diag(X.diag());
  // Obtain the subset
  arma::vec g_sub = g.elem(xi); 

  // Perform element-wise multiplication and then sum.
  double summed = arma::sum(g_sub % x_diag.elem(xi) % g_sub);

  // Return result
  return summed;
}

测试代码：

# Sparse matrix
library(Matrix)
i <- c(1,4:8,10); j <- c(2, 9, 6:10); x <- 7 * (1:7)
X <- sparseMatrix(i, j, x = x)  

X
# 10 x 10 sparse Matrix of class "dgCMatrix"
#                               
#  [1,] . 7 . . .  .  .  .  .  .
#  [2,] . . . . .  .  .  .  .  .
#  [3,] . . . . .  .  .  .  .  .
#  [4,] . . . . .  .  .  . 14  .
#  [5,] . . . . . 21  .  .  .  .
#  [6,] . . . . .  . 28  .  .  .
#  [7,] . . . . .  .  . 35  .  .
#  [8,] . . . . .  .  .  . 42  .
#  [9,] . . . . .  .  .  .  .  .
# [10,] . . . . .  .  .  .  . 49

# Vector
g <- 1:10

# Indices
xi <- c(0, 3, 4, 9)

# Above function
submat_multiply(X, g, xi)
# [1] 4900

submat_multiply_v2(X, g, xi)
# [1] 4900

【讨论】：

很好。也许还断言g 和X 的（行）维度？
非常感谢。我会认为在 RcppArmadillo 中可能还有另一个现成的功能。根据您的指导，我编写了一个两个循环函数来实现它double spsubmat_multiply(const sp_mat X, const vec g, const uvec windxi){ vec gmtpX(windxi.n_elem); double summed, varw; for(int i = 0; i < windxi.n_elem; i++){ summed = 0; for(int j = 0; j < windxi.n_elem; j++){ if(g[windxi[j]]){summed += g[windxi[j]] * X(windxi[j], windxi[i]);} } gmtpX[i] = summed; } varw = sum(gmtpX % g(windxi)); return varw; }
该评论无法阅读。考虑将您的代码段添加为对您的问题的编辑。