【发布时间】:2017-04-03 04:40:19
【问题描述】:
我有一个 pandas 数据框,其中包含数百万客户的产品名称 [a, b, c, d, e, f, j, h, i, j, k, l]。 对于每个产品,数据报告客户在当月使用该产品(以 1 表示)或未使用(以 0 表示)。
客户的原始分类:1为使用,0为不使用
我想将产品用途重新分类为四类:
S:二手
M:维持使用(在随后的几个月中使用)
N:没用过
D:保持不使用(连续几个月不使用)
原始数据如下所示:
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| Customer_ID | Month | a | b | c | d | e | f | j | h | i | j | k | l |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| 19509 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19509 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19509 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
| 19510 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19510 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19510 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
| 19511 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19511 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19511 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
我想将客户重新分类为四个类别,以考虑那些保持使用或保持几个月不使用的客户。
结果应如下所示:
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| Customer_ID | Month | a | b | c | d | e | f | j | h | i | j | k | l |
+-------------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 19509 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19509 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19509 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19509 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19509 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19509 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19509 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19509 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19509 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
| 19510 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19510 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19510 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19510 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19510 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19510 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19510 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19510 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19510 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
| 19511 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19511 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19511 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19511 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19511 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19511 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19511 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19511 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19511 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
执行它的算法似乎很复杂,我仍在考虑执行它的适当顺序。
我想为所有客户和所有产品(列)做这件事,我认为我们可以这样开始:
for i in customer_ID:
for j in df.columns:
注意:这个case不是use and nonuse case,而是join(1)、cancel(0)、keep idle(0)和if againjoin(1)等等。所以当它为零时,意味着客户取消了服务,当它在接下来的三个月内为零时,意味着他不是客户,然后他加入了,他再次取消,我们应该知道他取消了多少次服务.如果我们只计算总数,它不会告诉我们客户加入了多少次以及取消了多少次特定产品或服务。
感谢任何解决此问题的建议或想法。
【问题讨论】:
标签: python algorithm dataframe subset