【发布时间】:2021-06-12 23:59:21
【问题描述】:
我知道这里有一些类似的问题,但请继续阅读,因为我已经查看并尝试调整现有解决方案,但没有任何运气。我有一个数据框,可以提取年份和季度的数据。在下面显示的场景中,prevYearLeadCount 显示 2020 年第一季度的数据。要清楚
prevYearLeadCount 将始终显示上一年同一季度的潜在客户数量。下面只是一个示例,展示了数据的结构。另外,看看下面的数据,因为有 2019 年第四季度的数据,我预计 2020 年第四季度 prevYearLeadCount 等于 236
[
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2017,
"quarter": 2,
"leadCount": 151,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2018,
"quarter": 2,
"leadCount": 73,
"prevYearLeadCount": 151.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2018,
"quarter": 3,
"leadCount": 271,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2018,
"quarter": 4,
"leadCount": 173,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 1,
"leadCount": 209,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 2,
"leadCount": 274,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 3,
"leadCount": 311,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 4,
"leadCount": 236,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 1,
"leadCount": 245,
"prevYearLeadCount": 209.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 2,
"leadCount": 430,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 3,
"leadCount": 907,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 4,
"leadCount": 657,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2021,
"quarter": 1,
"leadCount": 609,
"prevYearLeadCount": 245.0
}
]
查看上面的数据,我预计 2020 年将如下所示:
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 1,
"leadCount": 209,
"prevYearLeadCount": 209.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 2,
"leadCount": 430,
"prevYearLeadCount": 274
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 3,
"leadCount": 907,
"prevYearLeadCount": 311
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 4,
"leadCount": 657,
"prevYearLeadCount": 236
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2021,
"quarter": 1,
"leadCount": 609,
"prevYearLeadCount": 245.0
}
正如here 所见,我尝试了以下方法:
df['prev_year_lead_count'] = df.groupby("quarter").lead_count.shift()[ (df.year == df.year.shift() + 1) ]
这很接近,因为我在某些情况下得到了我所期望的,但不是全部。在某些框架中,我看到我应该在上一年和上一季度肯定存在数据的 0。我正在尝试完全按照here 所见,但每年都分为几个季度。
我尝试过的另一件事是将 python 和 pandas 结合起来。这里的想法是遍历框架中的现有年份,并检查前一年以查看该季度是否存在。如果是这样,那就做熊猫吧。
qs = [1, 2, 3, 4]
for year in leads_df["year"].unique():
df = leads_df[leads_df["year"] == year - 1]
for q in qs:
if q in df["quarter"]:
leads_df["prev_year_lead_count"] = leads_df.groupby("quarter")["lead_count"].shift(+1)
leads_df["prev_year_cost"] = leads_df.groupby("quarter")["cost"].shift(+1)
leads_df["prev_year_ga_spent"] = leads_df.groupby("quarter")["ga_spent"].shift(+1)
leads_df["prev_year_fb_spent"] = leads_df.groupby("quarter")["fb_spent"].shift(+1)
leads_df["prev_year_monthly_package_cost"] = leads_df.groupby("quarter")[
"monthly_package_cost"
].shift(+1)
leads_df["prev_year_cpl"] = leads_df.groupby("quarter")["cpl"].shift(+1)
【问题讨论】:
-
您发布的示例 df 不适用于解决方案,因为 df 只有一行。 Shift 从前一行或后一行调用一个 val,因此需要不止一行才能工作。
-
抱歉,这只是数据结构的一个示例。
-
不用担心。但是,如果您可以提供一些更全面的示例数据,那么重现问题并可能提供解决方案会容易得多。
-
好了!所以多一点背景——我总是会提取 4 年的数据。问题是客户来来去去。所以我总是会在这里和那里错过四分之一。
-
另外,查看上面的数据,因为有 2019 年第四季度的数据,我预计 2020 年第四季度 prevYearLeadCount 等于 236