【发布时间】:2020-11-16 04:22:14
【问题描述】:
我有一个如下的数据框:
MARKET PRODUCT TIMEPERIOD DATE VALUES
0 USA MARKET APPLE QUARTER 2020-06-01 100
1 USA MARKET APPLE YEARLY 2020-06-01 1000
2 USA MARKET PEAR QUARTER 2020-06-01 200
3 USA MARKET PEAR YEARLY 2020-06-01 5000
4 USA MARKET APPLE QUARTER 2019-06-01 300
5 USA MARKET PEAR YEARLY 2019-06-01 2000
6 USA MARKET PEAR QUARTER 2019-06-01 100
7 USA MARKET PEAR YEARLY 2019-06-01 3000
8 USA MARKET APPLE QUARTER 2018-06-01 300
9 USA MARKET PEAR YEARLY 2018-06-01 2000
10 USA MARKET PEAR QUARTER 2018-06-01 100
11 USA MARKET PEAR YEARLY 2018-06-01 3000
12 UK MARKET WATERMELON QUARTER 2020-06-01 200
13 UK MARKET WATERMELON YEARLY 2020-06-01 5000
14 UK MARKET GRAPE QUARTER 2020-06-01 200
15 UK MARKET GRAPE YEARLY 2020-06-01 5000
16 UK MARKET WATERMELON QUARTER 2019-06-01 500
17 UK MARKET WATERMELON YEARLY 2019-06-01 300
18 UK MARKET GRAPE QUARTER 2019-06-01 50
19 UK MARKET GRAPE YEARLY 2019-06-01 500
20 UK MARKET WATERMELON QUARTER 2018-06-01 500
21 UK MARKET WATERMELON YEARLY 2018-06-01 300
22 UK MARKET GRAPE QUARTER 2018-06-01 50
23 UK MARKET GRAPE YEARLY 2018-06-01 500
我想找出每个市场每个时间段的每个产品的年同比差异(那是一口!)例如,对于 TIMEPERIOD Quarter 期间 USA MARKET 的产品 APPLE,2020-06-01 的增长率是简单地说 (100-300)/300 = 66.6%,其中我使用 2020-06-01 减去 2019-06-01 除以 2019-06-01 的值。
如您所见,以下代码的问题在于它只返回了当年与过去一年的增长率。并且忽略了过去的 2019 年和 2018 年。我尝试了几个 if-else 块,但似乎都指向了一些错误,如果有任何巧妙的解决方案来解决这个问题,我将不胜感激。简而言之,我的growth_rate_prev 在这里没有使用(虽然我确实尝试过编织它但它失败了)。
def year_on_year(df):
try:
curr_year_val = df[df['DATE']==max(df['DATE'])]['VALUES'].sum()
prev_year_val = df[df['DATE']==(max(df['DATE'])-relativedelta(months=12))]['VALUES'].sum()
prev_prev_year_val = df[df['DATE']==(df(df['DATE'])-relativedelta(months=24))]['VALUES'].sum()
growth_rate_curr = ((curr_year_val)-(prev_year_val))/(prev_year_val)
growth_rate_prev = ((prev_year_val)-(prev_prev_year_val))/(prev_prev_year_val)
except ZeroDivisionError:
growth_rate_curr, growth_rate_prev = 0 , 0
return growth_rate_curr
def product_growth(applied_group_df):
applied_group_df['Year on Year difference'] = year_on_year(applied_group_df)
return applied_group_df
growth_rate_df = df_2.groupby(["TIMEPERIOD",'MARKET', 'PRODUCT']).apply(product_growth)
如果有人想重现代码,您可以使用以下代码创建 df:
df_list_for_yoy = [['USA MARKET', 'APPLE', 'QUARTER', '2020-06-01', 100], ['USA MARKET', 'APPLE', 'YEARLY', '2020-06-01', 1000],
['USA MARKET', 'PEAR', 'QUARTER', '2020-06-01', 200], ['USA MARKET', 'PEAR', 'YEARLY', '2020-06-01', 5000],
['USA MARKET', 'APPLE', 'QUARTER', '2019-06-01', 300], ['USA MARKET', 'APPLE', 'YEARLY', '2019-06-01', 2000],
['USA MARKET', 'PEAR', 'QUARTER', '2019-06-01', 100], ['USA MARKET', 'PEAR', 'YEARLY', '2019-06-01', 3000],
['USA MARKET', 'APPLE', 'QUARTER', '2018-06-01', 300], ['USA MARKET', 'APPLE', 'YEARLY', '2018-06-01', 2000],
['USA MARKET', 'PEAR', 'QUARTER', '2018-06-01', 100], ['USA MARKET', 'PEAR', 'YEARLY', '2018-06-01', 3000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 200], ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 5000],
['UK MARKET', 'GRAPE', 'QUARTER', '2020-06-01', 200], ['UK MARKET', 'GRAPE', 'YEARLY', '2020-06-01', 5000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2019-06-01', 500], ['UK MARKET', 'WATERMELON', 'YEARLY', '2019-06-01', 300],
['UK MARKET', 'GRAPE', 'QUARTER', '2019-06-01', 50], ['UK MARKET', 'GRAPE', 'YEARLY', '2019-06-01', 500],
['UK MARKET', 'WATERMELON', 'QUARTER', '2018-06-01', 500], ['UK MARKET', 'WATERMELON', 'YEARLY', '2018-06-01', 300],
['UK MARKET', 'GRAPE', 'QUARTER', '2018-06-01', 50], ['UK MARKET', 'GRAPE', 'YEARLY', '2018-06-01', 500]]
column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'DATE', 'VALUES']
df_2 = pd.DataFrame(df_list_for_yoy, columns = column_names)
df_2['DATE']= pd.to_datetime(df_2['DATE'])
【问题讨论】:
-
请注意:
(100-300)/300等于约 66.6% 的“负增长”。 -
我们应该假设数据框只有 2020、2019 和 2018 的值还是可以有更多?
-
@sharathnatraj 它可能有更多,在我的真实数据中它有到 2013 年
标签: python pandas dataframe group-by