【问题标题】:Group rows into a new row and sum in r将行分组为新行并在 r 中求和
【发布时间】:2017-11-16 21:59:09
【问题描述】:

所以我的数据看起来像这样:

 Week        Total Amount        Person
   1            $5                 A
   1            $5                 B
   1            $4                 C
   1            $2                 D
   1            $1                 E
   2            $5                 A
   2            $1                 B
   2            $1                 H
   2            $3                 G
   2            $5                 C
   2            $5                 F

如何做到这一点,以便每周显示前 3 名并将所有其他金额汇总到“其他”中?我希望它显示:

 Week        Total Amount        Person
   1            $5                 A
   1            $5                 B
   1            $4                 C
   1            $3                 Others
   2            $5                 A
   2            $5                 C
   2            $5                 F
   2            $5                 Others

请注意,不是前 3 名的其他金额被加总为一个新的总金额,它考虑了每周的随机行数(比如第 1 周每个人的总金额为 5,但第 2 周有 6 ,第 3 周可能是 8 或 10,第 4 周总共可能是 1,但我希望等式适用于每一行)

【问题讨论】:

    标签: r dataframe grouping


    【解决方案1】:

    这可以很容易地使用 tidyverse。在名为 df 的数据框中说出这一点。

    library(tidyverse)
    
    df.new <- df %>%
      group_by(Week) %>%
      arrange(`Total Amount`) %>%
      mutate(Person = ifelse(row_number() > 3, "Others", Person)) %>%
      group_by(Week, Person) %>%
      summarize(`Total Amount` = sum(`Total Amount`))
    

    如果列中有“$”(它是一个字符串列),您首先需要对其进行转换,然后才能使用汇总行。您可以使用诸如 parse_number() 之类的函数来执行此操作。

    【讨论】:

    • 使用row_number() 的好主意...我想出的有点胡思乱想testi&lt;-data.frame(Week=c(1,1,1,1,1,2,2,2,2,2,2), Total_Amount=c(5,5,4,2,1,5,1,1,3,5,5), Person=c("A","B","C","D","E","A","B","H","G","C","F"), stringsAsFactors = FALSE) testi_1&lt;-testi%&gt;%group_by(Week)%&gt;%top_n(n = 3,wt=Total_Amount) testi_2&lt;-testi%&gt;%setdiff(testi_1,testi)%&gt;%group_by(Week)%&gt;% summarise(Total_Amount=sum(Total_Amount),Person="Other") final&lt;-bind_rows(testi_1,testi_2)
    【解决方案2】:

    基础R

    df$Person[ave(df$`Total Amount`, df$Week, FUN = function(x)
        order(x, decreasing = TRUE)) > 3] = "Others"
    df2 = aggregate(df["Total Amount"], df[c("Week", "Person")], sum)
    df2[order(df2$Week, df2$Person),]
    #  Week Person Total Amount
    #1    1      A            5
    #3    1      B            5
    #4    1      C            4
    #7    1 Others            3
    #2    2      A            5
    #5    2      C            5
    #6    2      F            5
    #8    2 Others            5
    

    数据

    df = structure(list(Week = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
    2L), `Total Amount` = c(5L, 5L, 4L, 2L, 1L, 5L, 1L, 1L, 3L, 5L, 
    5L), Person = c("A", "B", "C", "D", "E", "A", "B", "H", "G", 
    "C", "F")), .Names = c("Week", "Total Amount", "Person"), class = "data.frame",
    row.names = c(NA, -11L))
    

    【讨论】:

      【解决方案3】:

      这是您可以做到的一种方法:

      library(tidyverse)
      
      df <- df %>% 
        group_by(Week) %>% 
        arrange(desc(Total_Amount), .by_group = TRUE) %>% 
        mutate(id = row_number()) %>% 
        mutate(Person = case_when(id > 3 ~ "Others",
                                  TRUE ~ as.character(Person)))
      

      然后删除 $ 符号,以便我们可以对 Total_Amount 求和:

      df$Total_Amount <- as.numeric(gsub("\\$", "", df$Total_Amount))
      

      最后,按组对Total_Amount 求和并添加 $ 符号以恢复所有内容:

      df %>% 
        group_by(Week, Person) %>% 
        summarise(Total_Amount = sum(Total_Amount)) %>% 
        mutate(Total_Amount = paste0("$", Total_Amount)) %>% 
        select(Week, Total_Amount, Person)
      

      返回:

      # A tibble: 8 x 3
      # Groups:   Week [2]
         Week Total_Amount Person
        <int>        <chr>  <chr>
      1     1           $5      A
      2     1           $5      B
      3     1           $4      C
      4     1           $3 Others
      5     2           $5      A
      6     2           $5      C
      7     2           $5      F
      8     2           $5 Others
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-12-07
        • 2012-01-26
        • 1970-01-01
        • 1970-01-01
        • 2015-03-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多