用不同的总和总结 R（枢轴）中的数据答案

【问题标题】：summarizing data in R (pivot) with disctinct sum用不同的总和总结 R（枢轴）中的数据
【发布时间】：2019-07-29 15:07:03
【问题描述】：

我正在尝试总结我的数据表，类似于我们在 Excel 中的数据透视表中所做的，但在这里我想使用 count distinct 和 sum distinct

df <- data.frame(order_date =c("15-07-2019","15-07-2019","15-07-2019",
                                 "15-07-2019","15-07-2019","15-07-2019",
                                 "15-07-2019","15-07-2019"), 
                   hour = c(1,1,1,1,1,1,2,2), 
                   Country = c("KSA","KSA","UAE","UAE",
                               "UAE","KSA","KW","KW"), 
                   Order_language = c("English","English","English",
                                      "English","English","English",
                                      "English","English"),
                   order_no = c(400130191,400130191,500239645,500239645,
                                500239645,400158425,600009114,600009114), 
                   item_number = c(1365453,1365454,1365463,1365464,1365465,
                                   1365457,1365537,1365538),
                   item_total = c(100,120,100,50,145,214,1,4) , 
                   order_total = c(234,234,359,359,359,234,5.142,5.142))

我想以数据透视表格式总结数据框（看起来像这样）

summary <- data.frame(hour =c(1,2),
                        Total_order = c(3,1), 
                        Total_Item =c(6,2),
                        gross_Sales = c(827,10.4),
                        KSA_order = c(2,0),
                        KSA_item = c(3,0) ,
                        KSA_gross_sales = c(468,0))

这里，

Total_order = distinct_count(order_no) for that hour
Total_Item = distinct_count(item_number) for that hour
gross_Sales = distinct_sum_per_order(order_total) for that hour
KSA_order = distinct_count(order_no) for that hour for KSA country filter
KSA_item = distinct_count(item_number) for that hour for KSA country filter
KSA_gross_sales = distinct_sum_per_order(order_total)  for that hour for KSA country filter

我尝试使用 group by 和 summarise，但我卡在 Gross_sales 的计算上，因为必须取不同的 order_total 总和

summary <- df %>% 
            group_by(hour) %>% 
            summarise(KSA_order_cnt = n_distinct(order_no[Country == "KSA"]), 
                      KSA_item_cnt = n_distinct(item_number[Country == "KSA"]),
                      KSA_net_sales = sum(order_total[Country == "KSA"]))

【问题讨论】：

标签： r dplyr pivot-table reshape2

【解决方案1】：

也许我们需要

library(dplyr)
df %>%
   group_by(hour) %>%
   summarise(Total_order = n_distinct(order_no),
             Total_Item = n(),
             gross_Sales = sum(unique(order_total)), 
             KSA_order = n_distinct(order_no[Country == 'KSA']), 
             KSA_item = sum(Country == 'KSA'),
             KSA_gross_sales = sum(unique(order_total[Country == 'KSA'])))

【讨论】：

但是当我有 2 个具有相同订单价值的订单时，就会出现问题。我在数据框中特别指出了这一点，例如订单号 400158425 与 400130191 具有相同的订单值，因此将显示 234 作为输出而不是 468。