【发布时间】:2021-08-15 07:12:00
【问题描述】:
不确定这是做我想做的最好或正确的方法。
我有以下df:
df = pd.DataFrame(np.array([['1-1-2020', '123','How can I Help?', 'Delivered'], ['1-1-2020', '123','How can I Help?', 'Opened'], ['1-2-2021', '100','New Offer', 'Delivered'],['1-2-2021', '100','New Offer', 'Delivered'],['1-4-2021', '144','Last chance, buy now!', 'Delivered'],['1-4-2021', '144','Last chance, buy now!', 'Delivered'],['2-4-2021', '144','Last chance, buy now!', 'Opened']]),
columns=['Date', 'Customer_ID','Subject', 'Status'])
Date Customer_ID Subject Status
0 1-1-2020 123 How can I Help? Delivered
1 1-1-2020 123 How can I Help? Opened
2 1-2-2021 100 New Offer Delivered
3 1-2-2021 100 New Offer Delivered
4 1-4-2021 144 Last chance, buy now! Delivered
5 1-4-2021 144 Last chance, buy now! Delivered
6 2-4-2021 144 Last chance, buy now! Opened
在这个df中,客户123收到了一封电子邮件,然后在第二行打开了它。 客户 100 发送了两次电子邮件 并且客户 144 的电子邮件已发送两次,其中一个已打开。
我正在尝试跟踪每个客户的每封电子邮件的发送和打开状态以及最后操作日期。
因此,我创建了两个数据框:一个用于交付,一个用于打开,并将它们合并到交付的一个上以跟踪打开的内容。
df_del = df.loc[(df['Status'] == 'Delivered')]
df_open = df.loc[(df['Status'] == 'Opened')]
d = df_del.rename(columns={'Date': 'Date Delivered'})
o = df_open.rename(columns={'Date': 'last action date', 'Status': 'Open Status'})
w = d.merge(o, on=['Customer_ID','Subject'], how='left')
w
这表明:
Date Delivered Customer_ID Subject Status last action date Open Status
0 1-1-2020 123 How can I Help? Delivered 1-1-2020 Opened
1 1-2-2021 100 New Offer Delivered NaN NaN
2 1-2-2021 100 New Offer Delivered NaN NaN
3 1-4-2021 144 Last chance, buy now! Delivered 2-4-2021 Opened
4 1-4-2021 144 Last chance, buy now! Delivered 2-4-2021 Opened
我的期望:
Date Delivered Customer_ID Subject Status last action date Open Status
0 1-1-2020 123 How can I Help? Delivered 1-1-2020 Opened
1 1-2-2021 100 New Offer Delivered 1-2-2021 NaN
2 1-2-2021 100 New Offer Delivered 1-2-2021 NaN
3 1-4-2021 144 Last chance, buy now! Delivered 2-4-2021 Opened
4 1-4-2021 144 Last chance, buy now! Delivered 1-4-2021 NaN
【问题讨论】:
-
Subject/Customer_ID 组合不是唯一的。您没有唯一的消息标识符?
-
@JanWilamowski 不幸的是,电子邮件没有唯一标识符,这就是为什么我使用客户 ID 和电子邮件主题的组合来在某种程度上跟踪打开率。
-
也许这个答案有帮助:stackoverflow.com/questions/40575486/…
标签: python python-3.x pandas dataframe