【问题标题】:Apply function on array returning original number of dimensions对返回原始维数的数组应用函数
【发布时间】:2021-01-12 13:45:02
【问题描述】:

以这个数组为例:

set.seed(1)
rows <- 5
cols <- 4
dept <- 3
a <- array(sample(1:100, rows*cols*dept), dim = c(rows, cols, dept))

返回

> a
, , 1

     [,1] [,2] [,3] [,4]
[1,]   68   43   85   73
[2,]   39   14   21   79
[3,]    1   82   54   37
[4,]   34   59   74   83
[5,]   87   51    7   97

, , 2

     [,1] [,2] [,3] [,4]
[1,]   44   96   72   99
[2,]   84   42   80   91
[3,]   33   38   40   75
[4,]   35   20   69    6
[5,]   70   28   25   24

, , 3

     [,1] [,2] [,3] [,4]
[1,]   32   22  100   50
[2,]   94   92   62   65
[3,]    2   90   23   11
[4,]   45   98   67   17
[5,]   18   64   49   36

对于每个“部门”维度,我想获得行的总和,同时保持数组的原始三个维度。我试过了

b <- apply(a, c(2,3), sum)
> b
     [,1] [,2] [,3]
[1,]  229  266  191
[2,]  249  224  366
[3,]  241  286  301
[4,]  369  295  179

它给出了正确的结果,但将其减少为 4 x 3 矩阵,因为行维度被折叠为 1 并且不再严格需要。但是,对于我的计算而言,每次执行操作时维度解释都会发生变化,这很不方便,因此我想获得一个 1x4x3 数组:

c <- array(b, dim = c(1, 4, 3))
> c
, , 1

     [,1] [,2] [,3] [,4]
[1,]  229  249  241  369

, , 2

     [,1] [,2] [,3] [,4]
[1,]  266  224  286  295

, , 3

     [,1] [,2] [,3] [,4]
[1,]  191  366  301  179

这完成了我想要的,但我认为它有点麻烦,我不确定如何将它推广到任意数量的维度上的不同操作。必须有一种更紧凑的方式来执行这些操作。我找到了“ray”包,但它与 R 4.0.2 不兼容。请注意,我的实际数组比这个例子大得多,我将不得不在数值优化问题中多次应用这些类型的操作,因此计算效率很重要。

【问题讨论】:

    标签: r arrays apply


    【解决方案1】:

    要将计算概括并保持在一行中,您可以这样做:

    array(apply(a, 2:3, sum), c(1, dim(a)[-1]))
    # , , 1
    # 
    # [,1] [,2] [,3] [,4]
    # [1,]  229  249  241  369
    # 
    # , , 2
    # 
    # [,1] [,2] [,3] [,4]
    # [1,]  266  224  286  295
    # 
    # , , 3
    # 
    # [,1] [,2] [,3] [,4]
    # [1,]  191  366  301  179
    

    或者,因为它是矢量化的,因此更快,使用colSums

    array(colSums(a, dims=1), c(1, dim(a)[-1]))
    # , , 1
    # 
    # [,1] [,2] [,3] [,4]
    # [1,]  229  249  241  369
    # 
    # , , 2
    # 
    # [,1] [,2] [,3] [,4]
    # [1,]  266  224  286  295
    # 
    # , , 3
    # 
    # [,1] [,2] [,3] [,4]
    # [1,]  191  366  301  179
    

    基准测试:

    set.seed(42)
    A <- array(rnorm(5e4*100*10), dim=c(5e4, 100, 10))
    
    library(rray)
    microbenchmark::microbenchmark(apply=array(apply(A, 2:3, sum), c(1, dim(A)[-1])),
                                   colSums=array(colSums(A, dims=1), c(1, dim(A)[-1])),
                                   rray_sum=rray_sum(A, 1))  ## rray: see other answer
    # Unit: milliseconds
    #     expr        min         lq       mean     median         uq        max neval cld
    #    apply 1273.51152 1381.72037 1416.33429 1395.84693 1433.72407 1848.88436   100   b
    #  colSums   72.07086   73.02890   73.85052   73.63013   74.38916   79.70227   100  a 
    # rray_sum   71.46261   72.50294   73.27564   73.00747   73.70348   80.36409   100  a 
    

    【讨论】:

      【解决方案2】:

      我能够使用 R4.0 兼容版本的 rray 软件包

      remotes::install_github("r-lib/rray")
      

      然后使用

      实现(更快)所需的结果
      # Increasing the array size for more realistic benchmarking
      rows <- 500
      cols <- 100
      dept <- 10
      
      draws <- rnorm(rows*cols*dept) # Standard normal draws instead of sampling from integers
      
      a <- rray(draws, dim = c(rows, cols, dept))
      b <- rray_sum(a, 1)
      

      基准代码:

      bm <- microbenchmark(
        base = {
          a <- array(draws, dim = c(rows, cols, dept))
          b <- apply(a, c(2,3), sum)
          c <- array(b, dim = c(1, 4, 3))
          c
        },
        rray = {
          a <- rray(draws, dim = c(rows, cols, dept))
          b <- rray_sum(a, 1)
          b
        }, times = 100)
      
      > bm
      Unit: microseconds
       expr    min     lq     mean  median      uq     max neval
       base 8619.9 8763.9 9245.898 8832.05 8984.25 20968.5   100
       rray  838.6  939.6 1186.008 1103.50 1134.40 13580.8   100
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-03-06
        • 2012-01-26
        相关资源
        最近更新 更多