隔离支出最高的客户答案

【问题标题】：Segregating top spending customers隔离支出最高的客户
【发布时间】：2014-12-11 00:45:25
【问题描述】：

我有一些数据格式如下：

custno  TrainingType    TrainingDate    1   2   3   4   5   6
100     Presentation    2013-11-26    29.85  49.75  146.70  122.70  59.70   29.85
100     Presentation    2014-02-25    122.70 49.75  39.80   109.45  218.90  89.55
100     Training        2012-10-08    0.00   0.00   9.95    0.00    0.00    0.00
100     Training        2013-03-06    0.00   9.95   44.95   29.85   137.50  59.70

这只是示例数据，我为成千上万具有不同custno 的客户提供了这些数据。 1 through 6 列中的数据表示以月为单位的每月支出 1 through 6。我想隔离前 100 名消费客户。换句话说，我想要在所有月份中花费最多的前 100 名客户。

这是dput(head(df))的结果：

structure(list(custno = c(100L, 100L, 100L, 100L, 100L, 100L), 
    TrainingType = structure(c(2L, 2L, 4L, 4L, 4L, 4L), .Label = c("Demo", 
    "Presentation", "Tradeshow", "Training"), class = "factor"), 
    TrainingDate = structure(c(1385452800, 1393315200, 1349679600, 
    1362556800, 1366095600, 1372748400), class = c("POSIXct", 
    "POSIXt"), tzone = ""), `1` = c(29.85, 122.7, 0, 0, 9.95, 
    137.5), `2` = c(49.75, 49.75, 0, 9.95, 64.85, 49.75), `3` = c(146.7, 
    39.8, 9.95, 44.95, 97.7, 89.55), `4` = c(122.7, 109.45, 0, 
    29.85, 69.65, 99.5), `5` = c(59.7, 218.9, 0, 137.5, 69.65, 
    119.4), `6` = c(29.85, 89.55, 0, 59.7, 69.65, 29.85)), .Names = c("custno", 
"TrainingType", "TrainingDate", "1", "2", "3", "4", "5", "6"), row.names = c(2L, 
3L, 5L, 6L, 7L, 8L), class = "data.frame")

会有人碰巧知道一种智能的方法吗？

任何帮助将不胜感激。

【问题讨论】：

所以你想让我们告诉你如何添加六列然后aggregate'custno' 的总和？这肯定是您应该在搜索 SO 或 google 并展示初始编码工作时表现出的一些努力。使用dput(head(your.data.frame.name)) 发布示例也很礼貌
我用dput(head(df)) 的结果更新了问题。很抱歉没有包括最初的编码工作。这更像是一个想法问题，我只是想知道如何去做，不一定要寻找它的确切代码。
我想到的一个想法是对数据进行聚类并尝试挑选出顶级客户。不知道这是否是最好的。
这里有一些提示：tapply(rowSums(df[,4:9]),df$custno,sum) 会告诉您每个客户的消费金额。接下来你可以order他们获得前100名。
@nicola：这看起来是一个很好的答案。建议你发布它。

标签： r

【解决方案1】：

我想这就是你想要的？

library(dplyr)
library(tidyr)

tidydf <- gather(yourdata, month, spent, 4:9)

spendsum <- tidydf %>%
              group_by(custno) %>%
              summarise(
                totalspent = sum(spent)) %>%
              arrange(desc(totalspent))

【讨论】：

我猜想多了一行是slice(c(1:100))。然后，您可以选择前 100 名客户。