【发布时间】:2022-01-19 23:11:10
【问题描述】:
这是我的df -
| Site | Product | Period | Inflow | Outflow | Production | Opening Inventory | New Opening Inventory | Closing Inventory | Production Needed |
|---|---|---|---|---|---|---|---|---|---|
| California | Apples | 1 | 0 | 3226 | 4300 | 1213 | 1213 | 0 | 0 |
| California | Apples | 2 | 0 | 3279 | 3876 | 0 | 0 | 0 | 0 |
| California | Apples | 3 | 0 | 4390 | 4530 | 0 | 0 | 0 | 0 |
| California | Apples | 4 | 0 | 4281 | 3870 | 0 | 0 | 0 | 0 |
| California | Apples | 5 | 0 | 4421 | 4393 | 0 | 0 | 0 | 0 |
| California | Oranges | 1 | 0 | 505 | 400 | 0 | 0 | 0 | 0 |
| California | Oranges | 2 | 0 | 278 | 505 | 0 | 0 | 0 | 0 |
| California | Oranges | 3 | 0 | 167 | 278 | 0 | 0 | 0 | 0 |
| California | Oranges | 4 | 0 | 124 | 167 | 0 | 0 | 0 | 0 |
| California | Oranges | 5 | 0 | 106 | 124 | 0 | 0 | 0 | 0 |
| Montreal | Maple Syrup | 1 | 0 | 445 | 465 | 293 | 293 | 0 | 0 |
| Montreal | Maple Syrup | 2 | 0 | 82 | 398 | 0 | 0 | 0 | 0 |
| Montreal | Maple Syrup | 3 | 0 | 745 | 346 | 0 | 0 | 0 | 0 |
| Montreal | Maple Syrup | 4 | 0 | 241 | 363 | 0 | 0 | 0 | 0 |
| Montreal | Maple Syrup | 5 | 0 | 189 | 254 | 0 | 0 | 0 | 0 |
如图所示,按Site 和Product 分组时,共有三个组。对于三个组中的每一个,我都想执行以下操作(第 2 到第 5 阶段)-
- 将
New Opening Inventory设置为上一期的Closing Inventory - 使用公式计算下一个周期的
Closing Inventory,Closing Inventory=Production+Inflow+New Opening Inventory-Outflow
我正在尝试使用groupby 和for loop 的组合来解决这个问题
这是我目前所拥有的 -
如果df 是一个单独的组,我可以简单地做
# calculate closing inventory of period 1
df['Closing Inventory'] = np.where(df['PeriodNo']==1, <formula>, 0)
for i in range(1, len(df)):
df.loc[i, 'New Opening Inventory'] = df.loc[i-1, 'Closing Inventory']
df.loc[i, 'Closing Inventory'] = df.loc[i, 'Production'] + df.loc[i, 'Inflow'] + df.loc[i, 'New Opening Inventory'] - df.loc[i, 'Outflow']
当我尝试将此for loop 嵌套在groups 上的循环中时
# calculate closing inventory of period 1 for all groups
df['Closing Inventory'] = np.where(df['PeriodNo']==1, <formula>, 0)
g = df.groupby(['Site', 'Product']
alist = []
for k in g.groups.keys():
temp = g.get_group(k).reset_index(drop=True)
for i in range(1, len(temp)):
temp.loc[i, 'New Opening Inventory'] = temp.loc[i-1, 'Closing Inventory']
temp.loc[i, 'Closing Inventory'] = temp.loc[i, 'Production'] + temp.loc[i, 'Inflow'] + temp.loc[i, 'New Opening Inventory'] - temp.loc[i, 'Outflow']
alist.append(temp)
df2 = pd.concat(alist, ignore_index=True)
df2
此解决方案有效,但使用嵌套循环似乎非常低效。有没有更好的方法来做到这一点?
【问题讨论】:
标签: python pandas dataframe pandas-groupby