【问题标题】:summarizing data in R (pivot) with disctinct sum用不同的总和总结 R(枢轴)中的数据
【发布时间】:2019-07-29 15:07:03
【问题描述】:

我正在尝试总结我的数据表,类似于我们在 Excel 中的数据透视表中所做的,但在这里我想使用 count distinct 和 sum distinct

df <- data.frame(order_date =c("15-07-2019","15-07-2019","15-07-2019",
                                 "15-07-2019","15-07-2019","15-07-2019",
                                 "15-07-2019","15-07-2019"), 
                   hour = c(1,1,1,1,1,1,2,2), 
                   Country = c("KSA","KSA","UAE","UAE",
                               "UAE","KSA","KW","KW"), 
                   Order_language = c("English","English","English",
                                      "English","English","English",
                                      "English","English"),
                   order_no = c(400130191,400130191,500239645,500239645,
                                500239645,400158425,600009114,600009114), 
                   item_number = c(1365453,1365454,1365463,1365464,1365465,
                                   1365457,1365537,1365538),
                   item_total = c(100,120,100,50,145,214,1,4) , 
                   order_total = c(234,234,359,359,359,234,5.142,5.142))

我想以数据透视表格式总结数据框(看起来像这样)

summary <- data.frame(hour =c(1,2),
                        Total_order = c(3,1), 
                        Total_Item =c(6,2),
                        gross_Sales = c(827,10.4),
                        KSA_order = c(2,0),
                        KSA_item = c(3,0) ,
                        KSA_gross_sales = c(468,0))

这里,

Total_order = distinct_count(order_no) for that hour
Total_Item = distinct_count(item_number) for that hour
gross_Sales = distinct_sum_per_order(order_total) for that hour
KSA_order = distinct_count(order_no) for that hour for KSA country filter
KSA_item = distinct_count(item_number) for that hour for KSA country filter
KSA_gross_sales = distinct_sum_per_order(order_total)  for that hour for KSA country filter

我尝试使用 group by 和 summarise,但我卡在 Gross_sales 的计算上,因为必须取不同的 order_total 总和

summary <- df %>% 
            group_by(hour) %>% 
            summarise(KSA_order_cnt = n_distinct(order_no[Country == "KSA"]), 
                      KSA_item_cnt = n_distinct(item_number[Country == "KSA"]),
                      KSA_net_sales = sum(order_total[Country == "KSA"]))

【问题讨论】:

    标签: r dplyr pivot-table reshape2


    【解决方案1】:

    也许我们需要

    library(dplyr)
    df %>%
       group_by(hour) %>%
       summarise(Total_order = n_distinct(order_no),
                 Total_Item = n(),
                 gross_Sales = sum(unique(order_total)), 
                 KSA_order = n_distinct(order_no[Country == 'KSA']), 
                 KSA_item = sum(Country == 'KSA'),
                 KSA_gross_sales = sum(unique(order_total[Country == 'KSA'])))
    

    【讨论】:

    • 但是当我有 2 个具有相同订单价值的订单时,就会出现问题。我在数据框中特别指出了这一点,例如订单号 400158425 与 400130191 具有相同的订单值,因此将显示 234 作为输出而不是 468。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-11-30
    相关资源
    最近更新 更多