【问题标题】:How to use group_by with summarise and summarise_all?如何将 group_by 与 summarise 和 summarise_all 一起使用?
【发布时间】:2019-11-03 18:09:33
【问题描述】:
   x  y
1  1  1
2  3  2
3  2  3
4  3  4
5  2  5
6  4  6
7  5  7
8  2  8
9  1  9
10 1 10
11 3 11
12 4 12

以上是输入的一部分。

假设它还有一堆其他的列

我想:

  1. group_by x
  2. 按总和总结 y
  3. 对于所有其他列,我想通过仅取第一个值来汇总_all

【问题讨论】:

    标签: r group-by dplyr tidyverse


    【解决方案1】:

    这是一种将其分解为两个问题并将它们组合起来的方法:

    library(dplyr)
    left_join(
      # Here we want to treat column y specially
      df %>%
        group_by(x) %>%
        summarize(sum_y = sum(y)),
      # Here we exclude y and use a different summation for all the remaining columns
      df %>%
        group_by(x) %>%
        select(-y) %>%
        summarise_all(first)
      ) 
    
    # A tibble: 5 x 3
          x sum_y     z
      <int> <int> <int>
    1     1    20     1
    2     2    16     3
    3     3    17     2
    4     4    18     2
    5     5     7     3
    

    样本数据:

    df <- read.table(
      header = T, 
      stringsAsFactors = F,
      text="x  y z
            1  1 1
            3  2 2
            2  3 3
            3  4 4
            2  5 1
            4  6 2
            5  7 3
            2  8 4
            1  9 1
            1 10 2
            3 11 3
            4 12 4")
    

    【讨论】:

      【解决方案2】:
      library(dplyr)
      
      df1 %>%  
        group_by(x) %>% 
        summarise_each(list(avg = mean), -y) %>% 
        bind_cols(.,{df1 %>% 
                      group_by(x) %>% 
                      summarise_at(vars(y), funs(sum)) %>% 
                      select(-x)
                     })
      
      #> # A tibble: 5 x 4
      #>       x r_avg r.1_avg     y
      #>   <int> <dbl>   <dbl> <int>
      #> 1     1  6.67    6.67    20
      #> 2     2  5.33    5.33    16
      #> 3     3  5.67    5.67    17
      #> 4     4  9       9       18
      #> 5     5  7       7        7
      

      reprex package (v0.3.0) 于 2019 年 6 月 20 日创建

      数据:

      df1 <- read.table(text="
      r   x  y
      1  1  1
      2  3  2
      3  2  3
      4  3  4
      5  2  5
      6  4  6
      7  5  7
      8  2  8
      9  1  9
      10 1 10
      11 3 11
      12 4 12", header=T)
      
      df1 <- df1[,c(2,3,1,1)]
      

      【讨论】:

        【解决方案3】:
        library(tidyverse)
        df <- tribble(~x, ~y,  # making a sample data frame
         1,  1,
         3,  2,
         2,  3,
         3,  4,
         2,  5,
         4,  6,
         5,  7,
         2,  8,
         1,  9,
         1, 10,
         3, 11,
         4, 12)
        
        df <- df %>% 
          add_column(z = sample(1:nrow(df))) #add another column for the example
        
        df
        
        
        # If there is only one additional column and you need the first value
        df %>% 
          group_by(x) %>% 
          summarise(sum_y = sum(y), z_1st = z[1])
        
        
        # otherwise use summarise_at to address all the other columns
        f <- function(x){x[1]} # function to extract the first value
        df %>% 
          group_by(x) %>% 
          summarise_at(.vars = vars(-c('y')), .funs = f)  # exclude column y from the calculations
        

        【讨论】:

          猜你喜欢
          • 2019-09-14
          • 2021-10-03
          • 2021-01-07
          • 2018-06-13
          • 1970-01-01
          • 1970-01-01
          • 2022-11-23
          • 2018-12-24
          • 2022-11-15
          相关资源
          最近更新 更多