没有零填充的 Pandas Resample-Sum答案

【问题标题】：Pandas Resample-Sum without Zero filling没有零填充的 Pandas Resample-Sum
【发布时间】：2020-11-21 01:28:16
【问题描述】：

当使用平均聚合（每天到每月）重新采样系列时 -> 缺失的日期时间用 NaN 填充，这没关系，因为我们可以使用 .dropna() 函数简单地删除它们，然而，总和/总聚合 -> 缺失的日期时间用 0（零）填充，这在技术上是正确的，但有点麻烦，因为需要掩码来删除它们。问题是是否有一种更有效的方法可以在不填充零或使用掩码的情况下使用聚合总和进行重采样？最好与dropna() 类似，但用于删除 0。

例如：

ser = pd.Series([1]*6)
ser.index = pd.to_datetime(['2000-01-01', '2000-01-02', '2000-03-01', '2000-03-02', '2000-05-01', '2000-05-02'])
# wanted output
# 2000-01-31    2.0
# 2000-03-31    2.0
# 2000-05-31    2.0

# ideal output but for aggregate sum.
ser.resample('M').mean().dropna()
# 2000-01-31    1.0
# 2000-03-31    1.0
# 2000-05-31    1.0

# not ideal
ser.resample('M').sum()
# 2000-01-31    2
# 2000-02-29    0
# 2000-03-31    2
# 2000-04-30    0
# 2000-05-31    2

使用 .groupby() 和 .grouper() 似乎具有重新采样的确切行为。

# not ideal
ser.groupby(pd.Grouper(freq='M')).sum()
# 2000-01-31    2
# 2000-02-29    0
# 2000-03-31    2
# 2000-04-30    0
# 2000-05-31    2

使用.groupby() 和index.year 也是可行的，但是，日历月似乎没有“身份”。注意.index.month 不是我们所追求的。

ser = pd.Series([1]*6)
ser.index = pd.to_datetime(['2000-01-01', '2000-01-02', '2002-03-01', '2002-03-02', '2005-05-01', '2005-05-02'])
ser.groupby(ser.index.year).sum()
# 2000    2
# 2002    2
# 2005    2

【问题讨论】：

标签： python pandas time-series

【解决方案1】：

使用pd.offsets.MonthEnd 并将其与ser 的DatetimeIndex 添加以创建月末分组，然后将Series.groupby 与此分组一起使用并使用sum 或mean 聚合：

grp = ser.groupby(ser.index + pd.offsets.MonthEnd()) 
s1, s2 = grp.sum(), grp.mean()

结果：

print(s1)
2000-01-31    2
2002-03-31    2
2005-05-31    2
dtype: int64

print(s2)
2000-01-31    1
2002-03-31    1
2005-05-31    1
dtype: int64

【讨论】：