【问题标题】:R sum values for a non numeric column with duplicates具有重复项的非数字列的 R 总和值
【发布时间】:2021-04-21 09:22:26
【问题描述】:

我有以下数据集:

         Mark       Model      Sold
1      Toyota       Yaris      7739
2       Dacia      Duster      5798
3      Toyota     Corolla      4010
4      Toyota        RAV4      3258
5       Skoda       Fabia      3197
6        Fiat        Tipo      3157
7       Skoda     Octavia      3017

我需要一个公式来总结重复标记和已售总金额,以获得如下所示的结果:

         Mark       Model      Sold
1      Toyota           3     15007
2       Dacia           1      5798
3       Skoda           2      6214
4        Fiat           1      3157

有人可以帮我解决这个问题吗?

【问题讨论】:

    标签: r sum duplicates


    【解决方案1】:
    library(dplyr)
    
    df %>% group_by(Mark) %>% summarise(Model = n(), Sold = sum(Sold))
    

    df 是您的数据集。

    【讨论】:

      【解决方案2】:

      使用base R

      do.call(data.frame, aggregate(Sold ~ Mark, df, function(x)
                     c(Model = length(x), Sold = sum(x))))
      

      数据

      df <- structure(list(Mark = c("Toyota", "Dacia", "Toyota", "Toyota", 
      "Skoda", "Fiat", "Skoda"), Model = c("Yaris", "Duster", "Corolla", 
      "RAV4", "Fabia", "Tipo", "Octavia"), Sold = c(7739L, 5798L, 4010L, 
      3258L, 3197L, 3157L, 3017L)), class = "data.frame", row.names = c("1", 
      "2", "3", "4", "5", "6", "7"))
      

      【讨论】:

        【解决方案3】:

        data.table 选项

        > setDT(df)[, .(Model = .N, Sold = sum(Sold)), Mark]
             Mark Model  Sold
        1: Toyota     3 15007
        2:  Dacia     1  5798
        3:  Skoda     2  6214
        4:   Fiat     1  3157
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2016-02-19
          • 1970-01-01
          • 1970-01-01
          • 2020-03-22
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多