【发布时间】:2022-01-04 06:51:05
【问题描述】:
我有一个数据框,其中包含客户从 5 月 (202105) 到 2021 年 10 月 (202110) 的购买信息。客户不是每个月都购买,但我想用零填充该信息。我的数据如下所示:
soex <- data.frame(client_id = c("aaa","bbb","bbb","ccc","ccc","ddd","eee","eee","eee"),
v1 = c("xxx","xxx","xxx","yyy","yyy","xxx","yyy","xxx","yyy"),
first_buy = c("202105","202107","202107","202106","202106","202110","202107","202107","202107"),
sales_date = c("202105","202107","202109","202106","202110","202110","202107","202108","202109"),
qt_prod1 = c(10,60,30,2,45,11,14,167,145),
qt_prod2 = c(12,324,433,221,312,312,312,123,121))
client_id v1 first_buy sales_date qt_prod1 qt_prod2
1 aaa xxx 202105 202105 10 12
2 bbb xxx 202107 202107 60 324
3 bbb xxx 202107 202109 30 433
4 ccc yyy 202106 202106 2 221
5 ccc yyy 202106 202110 45 312
6 ddd xxx 202110 202110 11 312
7 eee yyy 202107 202107 14 312
8 eee xxx 202107 202108 167 123
9 eee yyy 202107 202109 145 121
- client_id = 客户 ID
- v1 = 随机变量
- first_buy = 首次购买的年份和月份
- sales_date = 购买年份和月份。第一个总是和 first_buy 一样
- qi_prod1(和 2)= 购买的产品数量
我需要的是一个如下所示的数据框:
ideal <- data.frame(client_id = c("aaa","aaa","aaa","aaa","aaa","aaa","bbb","bbb","bbb","bbb","ccc","ccc","ccc","ccc","ccc","ddd","eee","eee","eee","eee"),
v1 = c("xxx","xxx","xxx","xxx","xxx","xxx","xxx","xxx","xxx","xxx","yyy","yyy","yyy","yyy","yyy","xxx","yyy","xxx","yyy","yyy"),
first_buy = c("202105","202105","202105","202105","202105","202105","202107","202107","202107","202107","202106","202106","202106","202106","202106","202110","202107","202107","202107","202107"),
sales_date = c("202105","202106","202107","202108","202109","202110","202107","202108","202109","202110","202106","202107","202108","202109","202110","202110","202107","202108","202109","202110"),
qt_prod1 = c(10,0,0,0,0,0,60,0,0,30,2,0,0,0,45,11,14,167,145,0),
qt_prod2 = c(12,0,0,0,0,0,324,0,0,433,221,0,0,0,312,312,312,123,121,0))
client_id v1 first_buy sales_date qt_prod1 qt_prod2
1 aaa xxx 202105 202105 10 12
2 aaa xxx 202105 202106 0 0
3 aaa xxx 202105 202107 0 0
4 aaa xxx 202105 202108 0 0
5 aaa xxx 202105 202109 0 0
6 aaa xxx 202105 202110 0 0
7 bbb xxx 202107 202107 60 324
8 bbb xxx 202107 202108 0 0
9 bbb xxx 202107 202109 0 0
10 bbb xxx 202107 202110 30 433
11 ccc yyy 202106 202106 2 221
12 ccc yyy 202106 202107 0 0
13 ccc yyy 202106 202108 0 0
14 ccc yyy 202106 202109 0 0
15 ccc yyy 202106 202110 45 312
16 ddd xxx 202110 202110 11 312
17 eee yyy 202107 202107 14 312
18 eee xxx 202107 202108 167 123
19 eee yyy 202107 202109 145 121
20 eee yyy 202107 202110 0 0
当我必须考虑 first_buy 变量时,我的问题特别突出。正如您在“bbb”客户端的情况下看到的那样,我不希望数据从 2021 年 5 月开始......我希望它从 first_buy 月开始,直到 202110。
我的另一个问题是用sales_date的信息填充V1变量的信息。如果你看,比如“eee”客户的案例,她在202110没有购买,但在V1中她有202109的信息
谢谢,
【问题讨论】: