如何获得小标题中每个前 5% 步骤的平均值？答案

【问题标题】：How to get the mean value of every top 5% step in a tibble?如何获得小标题中每个前 5% 步骤的平均值？
【发布时间】：2019-11-18 02:58:03
【问题描述】：

例如：

data <- mtcars %>% select(wt, mpg, disp)

我想获取disp 的每个前 5% 步长的每列的平均值。这将产生 20 行，前 5% 的平均值 disp，前 10% 的平均值 disp，前 15% 的平均值 disp ...

像这样：

tops <- seq(.05, 1, by = 0.05)

# define list to save tibble of ecah LOPO
all_tops <- vector("list", length(tops))
names(all_tops) <- str_c("top", tops)

for (top in tops) {
    all_tops[[str_c("top", top)]] <- summarise_all(top_frac(data, top, disp), mean) %>% add_column(Top = top, .before = 1)
}

bind_rows(all_tops)

需要一个更整洁的解决方案，可能不使用for。

【问题讨论】：

你的意思是mtcars %>% group_by(group = findInterval(disp, quantile(disp,seq(0.05, 1, 0.05)))) %>% summarise_at(vars(wt, mpg, disp), mean) 吗？

标签： r dplyr tidyverse tibble

【解决方案1】：

您可以使用cut 定义哪些行属于disp 的哪个5% 波段：

mtcars %>%
  select(wt, mpg, disp) %>%
  mutate(dispcut = cut(disp, c(-Inf, quantile(disp, seq(0, 1, len=21))[-1]), labels = FALSE)) %>%
  group_by(dispcut) %>%
  summarize_all(~ mean(.))
# # A tibble: 17 x 4
#    dispcut    wt   mpg  disp
#      <int> <dbl> <dbl> <dbl>
#  1       1  1.72  32.2  73.4
#  2       2  2.07  29.8  78.8
#  3       3  1.51  30.4  95.1
#  4       4  2.39  22.2 114. 
#  5       5  2.14  26   120. 
#  6       6  2.96  22.1 131. 
#  7       7  2.77  19.7 145  
#  8       8  2.90  22.1 156. 
#  9      10  3.44  18.5 168. 
# 10      11  3.34  19.8 242. 
# 11      12  3.86  16.3 276. 
# 12      14  3.57  15   301  
# 13      15  3.48  15.4 311  
# 14      16  3.84  13.3 350  
# 15      17  3.39  16.3 357  
# 16      19  4.60  17.0 420  
# 17      20  5.34  10.4 466

【讨论】：

【解决方案2】：

我们使用imap_dfr 函数（来自tidyverse purrr 包）循环遍历每个disp 百分位数限制（我们使用quantile 函数计算），计算所需的均值，并返回一个数据框结果。

library(tidyverse)

quantile(mtcars$disp, seq(0,0.95,0.05)) %>% 
  imap_dfr(
    ~bind_cols(
      disp.min.percentile=.y,
      disp.min=.x, 
      mtcars %>% 
        select(disp, wt, mpg) %>% 
        filter(disp >= .x) %>% 
        mutate(n = n()) %>% 
        group_by(n) %>% 
        summarise_all(list(mean=mean)) %>% 
        ungroup()
    )) %>% 
  arrange(desc(disp.min))

   disp.min.percentile disp.min     n disp_mean wt_mean mpg_mean
 1 95%                    449       2      466     5.34     10.4
 2 90%                    396.      4      443     4.97     13.7
 3 85%                    360       6      415.    4.48     14.6
 4 80%                    351.      7      406.    4.29     14.8
 5 75%                    326       8      399.    4.24     14.6
 6 70%                    303.     10      382.    4.08     14.8
 7 65%                    280.     11      374.    4.04     14.8
 8 60%                    276.     14      353.    4.00     15.1
 9 55%                    259.     14      353.    4.00     15.1
10 50%                    196.     16      339.    3.92     15.7
11 45%                    167.     18      320.    3.86     16.0
12 40%                    160      20      304.    3.75     16.5
13 35%                    146.     21      297.    3.73     16.9
14 30%                    142.     22      290.    3.68     17  
15 25%                    121.     24      276.    3.62     17.4
16 20%                    120.     25      270.    3.56     17.8
17 15%                    103.     27      259.    3.48     18.1
18 10%                     80.6    28      253.    3.41     18.5
19 5%                      77.4    30      241.    3.32     19.3
20 0%                      71.1    32      231.    3.22     20.1

【讨论】：