【问题标题】:Dividing Summary table Matrix into a few table Matrix in R将汇总表矩阵划分为R中的几个表矩阵
【发布时间】:2017-12-22 12:02:23
【问题描述】:

所以我有一个暗淡为 17 列和 1000 行的矩阵(所有这些都是数字),然后我总结了矩阵,summary(matrix) 然后我得到了这些:

我的问题是:有没有办法把这些汇总表分成几张表?像这些

          V1  V2  V3  V4  V5  V6 
Min

1st Qu

Median 

Mean

3rd Qu

Max

           V7  V8  V9  V10  V11  V12 

Min

1st Qu

Median 

Mean

3rd Qu

Max

           V13  V14  V15  V16  V17  

Min

1st Qu

Median 

Mean

3rd Qu

Max

我需要在我的 R 闪亮应用程序中保留空间,以便显示这些矩阵,而不会像这些那样显示相互碰撞

注意:对不起,如果我只能说是一张图片

【问题讨论】:

    标签: r matrix summary


    【解决方案1】:

    1) read.dcf/unnest矩阵的元素是DCF形式,所以我们可以使用read.dcf然后unnest

    library(tidyr)
    
    s <- summary(mtcars)
    DF <- read.dcf(textConnection(s), all = TRUE)
    res <- setNames(data.frame(t(unnest(DF)), check.names = FALSE), trimws(colnames(s)))
    

    给予:

    > res
              mpg   cyl  disp    hp  drat    wt  qsec     vs     am  gear  carb
    Min.    10.40 4.000  71.1  52.0 2.760 1.513 14.50 0.0000 0.0000 3.000 1.000
    1st Qu. 15.43 4.000 120.8  96.5 3.080 2.581 16.89 0.0000 0.0000 3.000 2.000
    Median  19.20 6.000 196.3 123.0 3.695 3.325 17.71 0.0000 0.0000 4.000 2.000
    Mean    20.09 6.188 230.7 146.7 3.597 3.217 17.85 0.4375 0.4062 3.688 2.812
    3rd Qu. 22.80 8.000 326.0 180.0 3.920 3.610 18.90 1.0000 1.0000 4.000 4.000
    Max.    33.90 8.000 472.0 335.0 4.930 5.424 22.90 1.0000 1.0000 5.000 8.000
    

    2) 子集列 为了减小宽度,可以将其分为 res[1:6]res[7:11] 或更一般地,如果有 n 列并且我们希望每个组有 k 列,除了可能是最后一组:

    n <- ncol(res)
    k <- 6
    g <- droplevels(gl(n, k, n)) # grouping vector
    lapply(split(as.list(res), g), data.frame)
    

    给予:

    $`1`
              mpg   cyl  disp    hp  drat    wt
    Min.    10.40 4.000  71.1  52.0 2.760 1.513
    1st Qu. 15.43 4.000 120.8  96.5 3.080 2.581
    Median  19.20 6.000 196.3 123.0 3.695 3.325
    Mean    20.09 6.188 230.7 146.7 3.597 3.217
    3rd Qu. 22.80 8.000 326.0 180.0 3.920 3.610
    Max.    33.90 8.000 472.0 335.0 4.930 5.424
    
    $`2`
             qsec     vs     am  gear  carb
    Min.    14.50 0.0000 0.0000 3.000 1.000
    1st Qu. 16.89 0.0000 0.0000 3.000 2.000
    Median  17.71 0.0000 0.0000 4.000 2.000
    Mean    17.85 0.4375 0.4062 3.688 2.812
    3rd Qu. 18.90 1.0000 1.0000 4.000 4.000
    Max.    22.90 1.0000 1.0000 5.000 8.000
    

    3) 不转置 减小宽度的另一种选择是不转置:

    data.frame(unnest(DF), row.names = trimws(colnames(s)), check.names = FALSE)
    

    给予:

         Min.    1st Qu. Median  Mean    3rd Qu. Max.   
    mpg    10.40   15.43   19.20   20.09   22.80   33.90
    cyl    4.000   4.000   6.000   6.188   8.000   8.000
    disp    71.1   120.8   196.3   230.7   326.0   472.0
    hp      52.0    96.5   123.0   146.7   180.0   335.0
    drat   2.760   3.080   3.695   3.597   3.920   4.930
    wt     1.513   2.581   3.325   3.217   3.610   5.424
    qsec   14.50   16.89   17.71   17.85   18.90   22.90
    vs    0.0000  0.0000  0.0000  0.4375  1.0000  1.0000
    am    0.0000  0.0000  0.0000  0.4062  1.0000  1.0000
    gear   3.000   3.000   4.000   3.688   4.000   5.000
    carb   1.000   2.000   2.000   2.812   4.000   8.000
    

    4) psych::describe 一个简单的替代方法是使用psynh::describe

    library(psych)
    
    describe(mtcars)
    

    给予:

         vars  n   mean     sd median trimmed    mad   min    max  range  skew kurtosis    se
    mpg     1 32  20.09   6.03  19.20   19.70   5.41 10.40  33.90  23.50  0.61    -0.37  1.07
    cyl     2 32   6.19   1.79   6.00    6.23   2.97  4.00   8.00   4.00 -0.17    -1.76  0.32
    disp    3 32 230.72 123.94 196.30  222.52 140.48 71.10 472.00 400.90  0.38    -1.21 21.91
    hp      4 32 146.69  68.56 123.00  141.19  77.10 52.00 335.00 283.00  0.73    -0.14 12.12
    drat    5 32   3.60   0.53   3.70    3.58   0.70  2.76   4.93   2.17  0.27    -0.71  0.09
    wt      6 32   3.22   0.98   3.33    3.15   0.77  1.51   5.42   3.91  0.42    -0.02  0.17
    qsec    7 32  17.85   1.79  17.71   17.83   1.42 14.50  22.90   8.40  0.37     0.34  0.32
    vs      8 32   0.44   0.50   0.00    0.42   0.00  0.00   1.00   1.00  0.24    -2.00  0.09
    am      9 32   0.41   0.50   0.00    0.38   0.00  0.00   1.00   1.00  0.36    -1.92  0.09
    gear   10 32   3.69   0.74   4.00    3.62   1.48  3.00   5.00   2.00  0.53    -1.07  0.13
    carb   11 32   2.81   1.62   2.00    2.65   1.48  1.00   8.00   7.00  1.05     1.26  0.29
    

    5) Hmisc::describe Hmisc还有一个describe函数:

    library(Hmisc)
    describe(mtcars)
    

    给予:

    mtcars 
    
     11  Variables      32  Observations
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    mpg 
           n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
          32        0       25    0.999    20.09    6.796    12.00    14.34    15.43    19.20    22.80    30.09    31.30 
    
    lowest : 10.4 13.3 14.3 14.7 15.0, highest: 26.0 27.3 30.4 32.4 33.9
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    cyl 
           n  missing distinct     Info     Mean      Gmd 
          32        0        3    0.866    6.188    1.948 
    
    Value          4     6     8
    Frequency     11     7    14
    Proportion 0.344 0.219 0.438
    
    ...etc...
    

    6)skimr::skim 这是一个新包。它可以生成火花图形作为汇总输出的一部分;但是,这取决于字体支持,这可能很棘手,因此我们在下面禁用了该部分。请注意,skim 需要数据框作为输入,因此如果您的输入是矩阵,请使用 skim(as.data.frame(input))

    library(skimr)
    skim_with(numeric = list(hist = NULL)) # omit spark histogram
    skim(mtcars) 
    

    给予:

    Skim summary statistics
     n obs: 32 
     n variables: 11 
    
    Variable type: numeric 
       variable missing complete  n   mean     sd   min    p25 median    p75    max
    1        am       0       32 32   0.41   0.5   0      0      0      1      1   
    2      carb       0       32 32   2.81   1.62  1      2      2      4      8   
    3       cyl       0       32 32   6.19   1.79  4      4      6      8      8   
    4      disp       0       32 32 230.72 123.94 71.1  120.83 196.3  326    472   
    5      drat       0       32 32   3.6    0.53  2.76   3.08   3.7    3.92   4.93
    6      gear       0       32 32   3.69   0.74  3      3      4      4      5   
    7        hp       0       32 32 146.69  68.56 52     96.5  123    180    335   
    8       mpg       0       32 32  20.09   6.03 10.4   15.43  19.2   22.8   33.9 
    9      qsec       0       32 32  17.85   1.79 14.5   16.89  17.71  18.9   22.9 
    10       vs       0       32 32   0.44   0.5   0      0      0      1      1   
    11       wt       0       32 32   3.22   0.98  1.51   2.58   3.33   3.61   5.42
    

    如果您想尝试 spark 图形,请参阅:Skimr - cant seem to produce the histograms

    7) pastecs::stat.desc pastecs 包还有一个可以使用的函数:

    stat.desc(mtcars)
    

    给予:

                         mpg         cyl         disp           hp         drat          wt        qsec          vs          am        gear       carb
    nbr.val       32.0000000  32.0000000 3.200000e+01   32.0000000  32.00000000  32.0000000  32.0000000 32.00000000 32.00000000  32.0000000 32.0000000
    nbr.null       0.0000000   0.0000000 0.000000e+00    0.0000000   0.00000000   0.0000000   0.0000000 18.00000000 19.00000000   0.0000000  0.0000000
    nbr.na         0.0000000   0.0000000 0.000000e+00    0.0000000   0.00000000   0.0000000   0.0000000  0.00000000  0.00000000   0.0000000  0.0000000
    min           10.4000000   4.0000000 7.110000e+01   52.0000000   2.76000000   1.5130000  14.5000000  0.00000000  0.00000000   3.0000000  1.0000000
    max           33.9000000   8.0000000 4.720000e+02  335.0000000   4.93000000   5.4240000  22.9000000  1.00000000  1.00000000   5.0000000  8.0000000
    range         23.5000000   4.0000000 4.009000e+02  283.0000000   2.17000000   3.9110000   8.4000000  1.00000000  1.00000000   2.0000000  7.0000000
    sum          642.9000000 198.0000000 7.383100e+03 4694.0000000 115.09000000 102.9520000 571.1600000 14.00000000 13.00000000 118.0000000 90.0000000
    median        19.2000000   6.0000000 1.963000e+02  123.0000000   3.69500000   3.3250000  17.7100000  0.00000000  0.00000000   4.0000000  2.0000000
    mean          20.0906250   6.1875000 2.307219e+02  146.6875000   3.59656250   3.2172500  17.8487500  0.43750000  0.40625000   3.6875000  2.8125000
    SE.mean        1.0654240   0.3157093 2.190947e+01   12.1203173   0.09451874   0.1729685   0.3158899  0.08909831  0.08820997   0.1304266  0.2855297
    CI.mean.0.95   2.1729465   0.6438934 4.468466e+01   24.7195501   0.19277224   0.3527715   0.6442617  0.18171719  0.17990541   0.2660067  0.5823417
    var           36.3241028   3.1895161 1.536080e+04 4700.8669355   0.28588135   0.9573790   3.1931661  0.25403226  0.24899194   0.5443548  2.6088710
    std.dev        6.0269481   1.7859216 1.239387e+02   68.5628685   0.53467874   0.9784574   1.7869432  0.50401613  0.49899092   0.7378041  1.6152000
    coef.var       0.2999881   0.2886338 5.371779e-01    0.4674077   0.14866382   0.3041285   0.1001159  1.15203687  1.22828533   0.2000825  0.5742933
    

    【讨论】:

    • 没有 1&2 我要找的东西!不可思议的解决方案@G.Grothendieck!,只是想确认,当运行这个语句 lapply(split(as.list(res), g), data.frame) 时,“data.frame”参数意味着我们必须定义类资源对吗?
    • 伟大而全面的答案(一如既往),但它使用mtcars data frame 作为示例,而 OP 要求 matrix 解决方案.我体验过数据框和矩阵之间的一些细微差别。
    • 我在 (6) 中添加了一个警告,如果尚未转换为数据框,请先转换为数据框(并将其作为问题添加到 skimr github 问题列表中)。其他 5 点均适用于 as.matrix(mtcars)mtcars
    【解决方案2】:

    另一种可能性是分段创建summary()

    library(data.table)
    for (x in split(i <- seq_along(mtcars), i %/% 4)) 
      as.data.table(mtcars)[, print(summary(.SD)), .SDcols = x]
    
          mpg             cyl             disp      
     Min.   :10.40   Min.   :4.000   Min.   : 71.1  
     1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8  
     Median :19.20   Median :6.000   Median :196.3  
     Mean   :20.09   Mean   :6.188   Mean   :230.7  
     3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0  
     Max.   :33.90   Max.   :8.000   Max.   :472.0  
           hp             drat             wt             qsec      
     Min.   : 52.0   Min.   :2.760   Min.   :1.513   Min.   :14.50  
     1st Qu.: 96.5   1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89  
     Median :123.0   Median :3.695   Median :3.325   Median :17.71  
     Mean   :146.7   Mean   :3.597   Mean   :3.217   Mean   :17.85  
     3rd Qu.:180.0   3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90  
     Max.   :335.0   Max.   :4.930   Max.   :5.424   Max.   :22.90  
           vs               am              gear            carb      
     Min.   :0.0000   Min.   :0.0000   Min.   :3.000   Min.   :1.000  
     1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
     Median :0.0000   Median :0.0000   Median :4.000   Median :2.000  
     Mean   :0.4375   Mean   :0.4062   Mean   :3.688   Mean   :2.812  
     3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
     Max.   :1.0000   Max.   :1.0000   Max.   :5.000   Max.   :8.000
    

    或模拟OP的矩阵:

    # create dummy data
    mat <- matrix(1:17000, ncol = 17)
    # set column names
    colnames(mat) <- 1:17
    # print summary piecewise
    for (x in split(i <- seq_along(dt), i %/% 6)) 
      print(summary(mat[, x]))
    
           1                2              3              4              5       
     Min.   :   1.0   Min.   :1001   Min.   :2001   Min.   :3001   Min.   :4001  
     1st Qu.: 250.8   1st Qu.:1251   1st Qu.:2251   1st Qu.:3251   1st Qu.:4251  
     Median : 500.5   Median :1500   Median :2500   Median :3500   Median :4500  
     Mean   : 500.5   Mean   :1500   Mean   :2500   Mean   :3500   Mean   :4500  
     3rd Qu.: 750.2   3rd Qu.:1750   3rd Qu.:2750   3rd Qu.:3750   3rd Qu.:4750  
     Max.   :1000.0   Max.   :2000   Max.   :3000   Max.   :4000   Max.   :5000  
           6              7              8              9              10              11       
     Min.   :5001   Min.   :6001   Min.   :7001   Min.   :8001   Min.   : 9001   Min.   :10001  
     1st Qu.:5251   1st Qu.:6251   1st Qu.:7251   1st Qu.:8251   1st Qu.: 9251   1st Qu.:10251  
     Median :5500   Median :6500   Median :7500   Median :8500   Median : 9500   Median :10500  
     Mean   :5500   Mean   :6500   Mean   :7500   Mean   :8500   Mean   : 9500   Mean   :10500  
     3rd Qu.:5750   3rd Qu.:6750   3rd Qu.:7750   3rd Qu.:8750   3rd Qu.: 9750   3rd Qu.:10750  
     Max.   :6000   Max.   :7000   Max.   :8000   Max.   :9000   Max.   :10000   Max.   :11000  
           12              13              14              15              16              17       
     Min.   :11001   Min.   :12001   Min.   :13001   Min.   :14001   Min.   :15001   Min.   :16001  
     1st Qu.:11251   1st Qu.:12251   1st Qu.:13251   1st Qu.:14251   1st Qu.:15251   1st Qu.:16251  
     Median :11500   Median :12500   Median :13500   Median :14500   Median :15500   Median :16500  
     Mean   :11500   Mean   :12500   Mean   :13500   Mean   :14500   Mean   :15500   Mean   :16500  
     3rd Qu.:11750   3rd Qu.:12750   3rd Qu.:13750   3rd Qu.:14750   3rd Qu.:15750   3rd Qu.:16750  
     Max.   :12000   Max.   :13000   Max.   :14000   Max.   :15000   Max.   :16000   Max.   :17000
    

    请注意,在矩阵情况下,建议/要求明确设置列名。如果相应的矩阵属性没有设置,summary() 使用默认的列名,总是从V1 开始。

    【讨论】:

      猜你喜欢
      • 2016-08-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-07-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-11-12
      相关资源
      最近更新 更多