如何将 group_by 与 summarise 和 summarise_all 一起使用？答案

【问题标题】：How to use group_by with summarise and summarise_all?如何将 group_by 与 summarise 和 summarise_all 一起使用？
【发布时间】：2019-11-03 18:09:33
【问题描述】：

以上是输入的一部分。

假设它还有一堆其他的列

我想：

group_by x
按总和总结 y
对于所有其他列，我想通过仅取第一个值来汇总_all

【问题讨论】：

标签： r group-by dplyr tidyverse

【解决方案1】：

这是一种将其分解为两个问题并将它们组合起来的方法：

library(dplyr)
left_join(
  # Here we want to treat column y specially
  df %>%
    group_by(x) %>%
    summarize(sum_y = sum(y)),
  # Here we exclude y and use a different summation for all the remaining columns
  df %>%
    group_by(x) %>%
    select(-y) %>%
    summarise_all(first)
  ) 

# A tibble: 5 x 3
      x sum_y     z
  <int> <int> <int>
1     1    20     1
2     2    16     3
3     3    17     2
4     4    18     2
5     5     7     3

样本数据：

df <- read.table(
  header = T, 
  stringsAsFactors = F,
  text="x  y z
        1  1 1
        3  2 2
        2  3 3
        3  4 4
        2  5 1
        4  6 2
        5  7 3
        2  8 4
        1  9 1
        1 10 2
        3 11 3
        4 12 4")

【讨论】：

【解决方案2】：

library(dplyr)

df1 %>%  
  group_by(x) %>% 
  summarise_each(list(avg = mean), -y) %>% 
  bind_cols(.,{df1 %>% 
                group_by(x) %>% 
                summarise_at(vars(y), funs(sum)) %>% 
                select(-x)
               })

#> # A tibble: 5 x 4
#>       x r_avg r.1_avg     y
#>   <int> <dbl>   <dbl> <int>
#> 1     1  6.67    6.67    20
#> 2     2  5.33    5.33    16
#> 3     3  5.67    5.67    17
#> 4     4  9       9       18
#> 5     5  7       7        7

^{由reprex package (v0.3.0) 于 2019 年 6 月 20 日创建}

数据：

df1 <- read.table(text="
r   x  y
1  1  1
2  3  2
3  2  3
4  3  4
5  2  5
6  4  6
7  5  7
8  2  8
9  1  9
10 1 10
11 3 11
12 4 12", header=T)

df1 <- df1[,c(2,3,1,1)]

【讨论】：

【解决方案3】：

library(tidyverse)
df <- tribble(~x, ~y,  # making a sample data frame
 1,  1,
 3,  2,
 2,  3,
 3,  4,
 2,  5,
 4,  6,
 5,  7,
 2,  8,
 1,  9,
 1, 10,
 3, 11,
 4, 12)

df <- df %>% 
  add_column(z = sample(1:nrow(df))) #add another column for the example

df


# If there is only one additional column and you need the first value
df %>% 
  group_by(x) %>% 
  summarise(sum_y = sum(y), z_1st = z[1])


# otherwise use summarise_at to address all the other columns
f <- function(x){x[1]} # function to extract the first value
df %>% 
  group_by(x) %>% 
  summarise_at(.vars = vars(-c('y')), .funs = f)  # exclude column y from the calculations

【讨论】：