【发布时间】:2019-07-29 15:07:03
【问题描述】:
我正在尝试总结我的数据表,类似于我们在 Excel 中的数据透视表中所做的,但在这里我想使用 count distinct 和 sum distinct
df <- data.frame(order_date =c("15-07-2019","15-07-2019","15-07-2019",
"15-07-2019","15-07-2019","15-07-2019",
"15-07-2019","15-07-2019"),
hour = c(1,1,1,1,1,1,2,2),
Country = c("KSA","KSA","UAE","UAE",
"UAE","KSA","KW","KW"),
Order_language = c("English","English","English",
"English","English","English",
"English","English"),
order_no = c(400130191,400130191,500239645,500239645,
500239645,400158425,600009114,600009114),
item_number = c(1365453,1365454,1365463,1365464,1365465,
1365457,1365537,1365538),
item_total = c(100,120,100,50,145,214,1,4) ,
order_total = c(234,234,359,359,359,234,5.142,5.142))
我想以数据透视表格式总结数据框(看起来像这样)
summary <- data.frame(hour =c(1,2),
Total_order = c(3,1),
Total_Item =c(6,2),
gross_Sales = c(827,10.4),
KSA_order = c(2,0),
KSA_item = c(3,0) ,
KSA_gross_sales = c(468,0))
这里,
Total_order = distinct_count(order_no) for that hour Total_Item = distinct_count(item_number) for that hour gross_Sales = distinct_sum_per_order(order_total) for that hour KSA_order = distinct_count(order_no) for that hour for KSA country filter KSA_item = distinct_count(item_number) for that hour for KSA country filter KSA_gross_sales = distinct_sum_per_order(order_total) for that hour for KSA country filter
我尝试使用 group by 和 summarise,但我卡在 Gross_sales 的计算上,因为必须取不同的 order_total 总和
summary <- df %>%
group_by(hour) %>%
summarise(KSA_order_cnt = n_distinct(order_no[Country == "KSA"]),
KSA_item_cnt = n_distinct(item_number[Country == "KSA"]),
KSA_net_sales = sum(order_total[Country == "KSA"]))
【问题讨论】:
标签: r dplyr pivot-table reshape2