1个熊猫数据框中的时间序列条件滚动平均值答案

【问题标题】：Time series conditional rolling mean in 1 pandas dataframe1个熊猫数据框中的时间序列条件滚动平均值
【发布时间】：2020-01-15 16:00:52
【问题描述】：

我目前正在研究解决条件滚动平均值。我创建了一个简化的数据集来演示：在这个数据集中，我们有 3 家商店和 2 种产品，它们在 4 天内的销售量。

Picture of the dataset, Link to download the dataset

考虑到真实数据集包括数千家商店和数百种产品，我试图为同一数据帧内的商店/产品的每个组合实现滚动平均值计算。

通过使用下面的代码，我可以计算每行的滚动平均值，in the same manner other data scientist calculate a 10 days or 20 days moving average for a share price：

import pandas as pd
df = pd.read_csv (r'path\ConditionalRollingMean.csv')
df['Rolling_Mean'] = df.Quantity.rolling(2).mean()

甚至

df['Rolling_Mean'] = df.Quantity.rolling(window=2).mean()

这种方法的问题是计算是逐行完成的，与商店/产品组合无关。我正在寻找的是一个条件滚动平均值，它在遍历数据框时跟踪商店/产品组合，并逐行填充 df['Rolling_Mean'] 列。（类似于this）

然后，此滚动平均值将用于滚动标准偏差计算，对此我只知道如何在整个数据帧中进行计算，而没有滚动方面。

df['mean']=df.groupby(['Quantity']).Qty.transform('mean')
df['std']=df.groupby(['Quantity']).Qty.transform('std')

将不同数据框中的商店/产品分开然后运行 df.Quantity.rolling(2).mean() 函数会更简单，但在我正在处理的情况下，这意味着创建超过 150 000 个数据帧。因此，为什么我要在 1 个数据帧内解决这个问题。

提前感谢您的帮助。

【问题讨论】：

标签： python pandas dataframe average rolling-computation

【解决方案1】：

我不是 100% 确定这是您想要的，但我只是对数据框的行进行了迭代，并检查了 if 条件以引导滚动平均值。

import pandas as pd

data = pd.read_csv('ConditionalRollingMean.csv')
data['rolling_mean'] = 0

nstore = 0
nquant = 0

for i in range(len(data)):
    q = data['Quantity'][i]
    p = data['Product'][i]
    s = data['StoreNb'][i]

    if s == 1.0 and p == 'A':
        nstore += 1
        nquant += q
        data.loc[i,'rolling_mean'] = nquant/nstore
    else:
        data.loc[i,'rolling_mean'] = nquant/nstore

print(data)

编辑：我编写了一个版本，它从数据框中找到商店/产品的所有组合，并为每个组合创建专用的滚动平均列。我希望这是您真正想要的，因为数千家商店和数百种产品的笛卡尔积相当大：

import pandas as pd
import itertools as it

data = pd.read_csv('ConditionalRollingMean.csv')

# Obtain all unique stores and products and find their cartesian product.
stores = set(pd.Series(data['StoreNb']).dropna())
products = set(data['Product'].dropna())
combs = it.product(stores,products)

# iterate over every combination of store/product and calculate rolling mean.
for comb in combs:

    store, product = comb

    # Set new, empty column for combination
    name = 'rm'+str(store)+product
    data[name] = 0

    # set starting values for rolling mean.
    nstore = 0
    nquant = 0

    # iterate over lines and do conditional checks to funnel results into
    # appropreate rolling mean column
    for i in range(len(data)):
        q = data['Quantity'][i]
        p = data['Product'][i]
        s = data['StoreNb'][i]

        if s == store and p == product:
            nstore += 1
            nquant += q
            data.loc[i,name] = nquant/nstore
        else:
            if nstore == 0:
                data.loc[i,name] = 0
            else:
                data.loc[i,name] = nquant/nstore


# write dataframe to new file.
data.to_csv('res.csv')

希望这会有所帮助。

【讨论】：

感谢 J.Doe 的贡献。我需要时间来查看我的实际任务的输出和应用程序，但这已经很有帮助。谢谢

【解决方案2】：

我将使用的解决方案如下：

df["Mean"] = df.groupby(['Store','Product'])['Quantity'].rolling(2).mean()

它给了我想要的输出。感谢您的意见。

【讨论】：