【发布时间】:2022-03-12 10:53:02
【问题描述】:
import pandas as pd
import numpy as np
df = pd.DataFrame([['A', 201901, 10, 201801, 201801],
['B', 201902, 11, 201801, 201802],
['B', 201903, 13, 201801, 201803],
['B', 201905, 18, 201801, 201805],
['A', 201906, 80, 201801, 201806],
['A', 202001, 10, 201901, 201901],
['A', 202002, 11, 201901, 201902],
['A', 202003, 13, 201901, 201903],
['A', 202004, 18, 201901, 201904],
['B', 202005, 80, 201901, 201905],
['A', 202006, 80, 201901, 201906],
['B', 201901, 10, 201801, 201801],
['A', 201902, 11, 201801, 201802],
['A', 201903, 13, 201801, 201803],
['A', 201905, 18, 201801, 201805],
['B', 201906, 80, 201801, 201806],
['B', 202001, 10, 201901, 201901],
['B', 202002, 11, 201901, 201902],
['B', 202003, 13, 201901, 201903],
['B', 202004, 18, 201901, 201904],
['A', 202005 ,80, 201901, 201905],
['B', 202006 ,80, 201901, 201906]],
columns = ['Store','yearweek','sales','Start_PY','PY'])
df
从上面的df (请注意缺少第 201904 周),我想在每行添加一个列 'Sales_PY',其中包含每家商店前一年的销售额总和。
像这样的:
| Store | yearweek | sales | Start_PY | PY | sales_PY |
|---|---|---|---|---|---|
| A | 201901 | 100 | 201801 | 201801 | NaN |
| B | 201902 | 11 | 201801 | 201802 | NaN |
| B | 201903 | 13 | 201801 | 201803 | NaN |
| B | 201905 | 18 | 201801 | 201805 | NaN |
| A | 201906 | 800 | 201801 | 201806 | NaN |
| A | 202001 | 100 | 201901 | 201901 | 100.0 |
| A | 202002 | 110 | 201901 | 201902 | 210.0 |
| A | 202003 | 130 | 201901 | 201903 | 340.0 |
| A | 202004 | 180 | 201901 | 201904 | 340.0 |
| B | 202005 | 80 | 201901 | 201905 | 52.0 |
| A | 202006 | 800 | 201901 | 201906 | 1320.0 |
| B | 201901 | 10 | 201801 | 201801 | NaN |
| A | 201902 | 110 | 201801 | 201802 | NaN |
| A | 201903 | 130 | 201801 | 201803 | NaN |
| A | 201905 | 180 | 201801 | 201805 | NaN |
| B | 201906 | 80 | 201801 | 201806 | NaN |
| B | 202001 | 10 | 201901 | 201901 | 10.0 |
| B | 202002 | 11 | 201901 | 201902 | 21.0 |
| B | 202003 | 13 | 201901 | 201903 | 34.0 |
| B | 202004 | 18 | 201901 | 201904 | 34.0 |
| A | 202005 | 800 | 201901 | 201905 | 520.0 |
| B | 202006 | 80 | 201901 | 201906 | 132.0 |
而且我认为 Pandas 中的 Excel 中必须有一个 SUMIF 等效项。
即最后一行的销售额 PY 将是销售额的总和 WHERE store == 'B' AND yearweek >= 201901 AND yearweek
因为我无法确保我的 df 将按商店/周排列,而且我的 df 有时会缺少几周,所以我不喜欢使用 shift() 和/或 cumsum() 函数。
【问题讨论】: