按月汇总的 Python 时间序列条件计算答案

【问题标题】：Python Timeseries Conditional Calculations Summarized by Month按月汇总的 Python 时间序列条件计算
【发布时间】：2023-03-09 19:18:02
【问题描述】：

我的时间序列数据框为：

timestamp	signal_value
2017-08-28 00:00:00	10
2017-08-28 00:05:00	3
2017-08-28 00:10:00	5
2017-08-28 00:15:00	5

我正在尝试获取“signal_value”大于 5 的平均每月百分比。类似于：

Month	metric
January	16%
February	2%
March	8%
April	10%

我尝试了以下代码，它给出了整个数据集的结果，但我如何每月总结它？

total,count = 0, 0

for index, row in df.iterrows():
    total += 1
    if row["signal_value"] >= 5:
        count += 1
print((count/total)*100)

提前谢谢你。

【问题讨论】：

请提供minimal reproducible example，以便我们可以帮助回答您的问题，而不是从头开始复制您的代码。这将有助于更快地获得答案。谢谢。

标签： python pandas time-series

【解决方案1】：

让我们先生成一些随机数据（生成取自here的随机日期）：

import pandas as pd
import numpy as np
import datetime

def randomtimes(start, end, n):
    frmt = '%d-%m-%Y %H:%M:%S'
    stime = datetime.datetime.strptime(start, frmt)
    etime = datetime.datetime.strptime(end, frmt)
    td = etime - stime
    dtimes = [np.random.random() * td + stime for _ in range(n)] 
    return [d.strftime(frmt) for d in dtimes]

# Recreat some fake data
timestamp = randomtimes("01-01-2021 00:00:00", "01-01-2023 00:00:00", 10000)
signal_value = np.random.random(len(timestamp)) * 10
df = pd.DataFrame({"timestamp": timestamp, "signal_value": signal_value})

现在我们可以将时间戳列转换为 pandas 时间戳，以提取每个时间戳的月份和年份：

df.timestamp = pd.to_datetime(df.timestamp)
df["month"] = df.timestamp.dt.month
df["year"] = df.timestamp.dt.year

我们生成一个布尔列是否signal_value 大于某个阈值（此处为 5）：

df["is_larger5"] = df.signal_value > 5

最后，我们可以使用 pandas.groupby 得到每个月的平均值：

>>> df.groupby(["year", "month"])['is_larger5'].mean()
year  month
2021  1        0.509615
      2        0.488189
      3        0.506024
      4        0.519362
      5        0.498778
      6        0.483709
      7        0.498824
      8        0.460396
      9        0.542918
      10       0.463043
      11       0.492500
      12       0.519789
2022  1        0.481663
      2        0.527778
      3        0.501139
      4        0.527322
      5        0.486936
      6        0.510638
      7        0.483370
      8        0.521253
      9        0.493639
      10       0.495349
      11       0.474886
      12       0.488372
Name: is_larger5, dtype: float64

【讨论】：