为稀疏矩阵类编写 C++ 迭代器答案

【问题标题】：Writing a C++ iterator for a sparse matrix class为稀疏矩阵类编写 C++ 迭代器
【发布时间】：2021-07-10 12:17:16
【问题描述】：

我正在尝试让一个基本的常量前向迭代器在 C++ 中工作。

namespace Rcpp {
    class SparseMatrix {
    public:
        IntegerVector i, p;
        NumericVector x;
   
        int begin_col(int j) { return p[j]; };
        int end_col(int j) { return p[j + 1]; };
        
        class iterator {
        public:
            int index;
            iterator(SparseMatrix& g) : parent(g) {}
            iterator(int ind) { index = ind; };                       // ERROR!
            bool operator!=(int x) const { return index != x; };
            iterator operator++(int) { ++index; return (*this); };
            int row() { return parent.i[index]; };
            double value() { return parent.x[index]; };
        private:
            SparseMatrix& parent;
        };
    };    
}

我的意图是在类似于以下的上下文中使用迭代器：

// sum of values in column 7
Rcpp::SparseMatrix A(nrow, ncol, fill::random);
double sum = 0;
for(Rcpp::SparseMatrix::iterator it = A.begin_col(7); it != A.end_col(7); it++)
    sum += it.value();

两个问题：

编译器在上面指出的行中抛出错误：uninitialized reference member in 'class Rcpp::SparseMatrix&' [-fpermissive]。如何解决这个问题？
如何重新设计double value() { return parent.x[index]; }; 以返回指向该值的指针而不是该值的副本？

SparseMatrix 类的一些上下文：就像 R 中的 dgCMatrix，SparseMatrix 类的对象由三个向量组成：

i 保存x 中每个元素的行指针
p 在i 中给出索引，对应于每列的开头
x 包含非零值

【问题讨论】：

您希望iterator(int ind) { index = ind; }; 做什么？它是一个构造函数，它应该初始化parent。也许你需要iterator(SparseMatrix&, int)ctor 之类的东西？
@Evg 谢谢，iterator(dgCMatrix& g, int ind) : parent(g) { index = ind;} 编译，我现在检查是否可行。
还有什么问题？迭代器总是必须知道它所迭代的矩阵。
begin_col 和 end_col 应该返回一个迭代器，而不是 int。
对于value()，您可以使用double& value()。还要注意operator++ 应该返回iterator&，而不是iterator。

标签： c++ iterator sparse-matrix const-iterator

【解决方案1】：

感谢@Evg，这是解决方案：

namespace Rcpp {
    class SparseMatrix {
    public:
        IntegerVector i, p;
        NumericVector x;
   
        class iterator {
        public:
            int index;
            iterator(SparseMatrix& g, int ind) : parent(g) { index = ind; }
            bool operator!=(iterator x) const { return index != x.index; };
            iterator& operator++() { ++index; return (*this); };
            int row() { return parent.i[index]; };
            double& value() { return parent.x[index]; };
        private:
            SparseMatrix& parent;
        };

        iterator begin_col(int j) { return iterator(*this, p[j]); };
        iterator end_col(int j) { return iterator(*this, p[j + 1]); };
    };    
}

它可以被如下使用，例如，计算colSums：

//[[Rcpp::export]]
Rcpp::NumericVector Rcpp_colSums(Rcpp::SparseMatrix& A) {
    Rcpp::NumericVector sums(A.cols());
    for (int i = 0; i < A.cols(); ++i)
        for (Rcpp::SparseMatrix::iterator it = A.begin_col(i); it != A.end_col(i); it++)
            sums(i) += it.value();
    return sums;
}

而且，当从 R 进行微基准测试时，上述函数比 RcppArmadillo、RcppEigen 和 R::Matrix 等价函数更快！

编辑：

上述语法的灵感来自犰狳。我开始意识到稍微不同的语法（涉及更少的结构）给出了一个类似于 Eigen 的迭代器：

class col_iterator {
    public:
      col_iterator(SparseMatrix& ptr, int col) : ptr(ptr) { indx = ptr.p[col]; max_index = ptr.p[col + 1]; }
      operator bool() const { return (indx != max_index); }
      col_iterator& operator++() { ++indx; return *this; }
      const double& value() const { return ptr.x[indx]; }
      int row() const { return ptr.i[indx]; }
    private:
      SparseMatrix& ptr;
      int indx, max_index;
    };

然后可以这样使用：

int col = 0;
for (Rcpp::SparseMatrix::col_iterator it(A, col); it; ++it)
     Rprintf("row: %3d, value: %10.2e", it.row(), it.value());

【讨论】：

首选预增量iterator& operator++()。后增量应返回 iterator 并具有不同的语义，auto tmp(*this); ++(*this); return tmp;。