【问题标题】:compute the average of values for every five seconds in python在python中计算每五秒的平均值
【发布时间】:2021-12-16 02:32:43
【问题描述】:

我有一个像下面这样的数据集,它的时间列是基于毫秒的。

pid_col ,timestamp_col ,value_col
31,2019-03-29 07:14:56.999999756,0.0
31,2019-03-29 07:14:57.250000,0.614595
31,2019-03-29 07:14:57.500000,0.678615
31,2019-03-29 07:14:57.750000,0.687578
31,2019-03-29 07:14:58.000000244,0.559804
31,2019-03-29 07:14:58.250000,0.522672
31,2019-03-29 07:14:58.499999512,0.51627
31,2019-03-29 07:14:58.750000,0.51627
31,2019-03-29 07:14:59.000000244,0.517551
31,2019-03-29 07:14:59.250000,0.51627
31,2019-03-29 07:14:59.500000244,0.509868
31,2019-03-29 07:14:59.750000488,0.513709
31,2019-03-29 07:15:00,0.513709
31,2019-03-29 07:15:00.249999512,0.518831
31,2019-03-29 07:15:00.500000,0.531635

我如何计算每 5 秒的平均值?对于这个数据集,我应该每 5 秒精确计算一次值的平均值...我的意思是前 5 秒的值应该在 7:14:56 之间计算直到 7:15:01 等每 5 秒一次。这是我的代码:

col_list = ["timestamp", "pid","value"]
df = read_csv("data.csv", usecols=col_list)
df['timestamp'] = to_datetime(df['timestamp'], unit='ms')
df = df.groupby(['pid', Grouper(freq='5S', key='timestamp')], as_index=False) \
      .agg({'timestamp': 'first', 'value': 'mean'})

感谢您的帮助

【问题讨论】:

    标签: python pandas numpy datetime time-series


    【解决方案1】:

    有一个很好的库叫做datetime,它能够在日期之间进行操作。例如:

    from datetime import datetime, timedelta
    
    # datetime(year, month, day, hour, minute, second, microsecond)
    time0 = datetime(2019, 3, 29, 7, 14, 57, 500000)
    print(time0)
    
    fiveseconds = timedelta(seconds=5)
    print(fiveseconds)
    
    time1 = time0 + fiveseconds
    print(time1)
    

    给出输出

    2019-03-29 07:14:57.500000
    0:00:05
    2019-03-29 07:15:02.500000
    

    然后你可以比较一下:

    from datetime import datetime, timedelta
    
    time0 = datetime(2019, 3, 29, 7, 14, 57, 500000)
    
    fourseconds = timedelta(seconds=4)
    fiveseconds = timedelta(seconds=5)
    sixseconds = timedelta(seconds=6)
    
    time1 = time0 + fiveseconds
    print(time1 < (time0 + fourseconds))  # False
    print(time1 < (time0 + sixseconds))  # True
    

    所以,对于你的问题:

    from datetime import datetime, timedelta
    from numpy import floor
    
    
    def convert(timestr):
        """
        It receives a string, like ""2019-03-29 07:14:57.250000"
        And returns a datetime instance
        """
        date = timestr.split(" ")
        year, month, day = date[0].split("-")
        year = int(year)
        month = int(month)
        day = int(day)
        hour, minute, second = date[1].split(":")
        hour = int(hour)
        minute = int(minute)
        intsecond = int(second.split(".")[0])
        if "." in second:
            microsecond = int(floor(1e+6 * float("0." + second.split(".")[1])))
        else:
            microsecond = 0
        return datetime(year, month, day, hour, minute, intsecond, microsecond)
    
    
    listtimes = ["2019-03-29 07:14:56.999999756",
                 "2019-03-29 07:14:57.250000",
                 "2019-03-29 07:14:57.500000",
                 "2019-03-29 07:14:57.750000",
                 "2019-03-29 07:14:58.000000244",
                 "2019-03-29 07:14:58.250000",
                 "2019-03-29 07:14:58.499999512",
                 "2019-03-29 07:14:58.750000",
                 "2019-03-29 07:14:59.000000244",
                 "2019-03-29 07:14:59.250000",
                 "2019-03-29 07:14:59.500000244",
                 "2019-03-29 07:14:59.750000488",
                 "2019-03-29 07:15:00",
                 "2019-03-29 07:15:00.249999512",
                 "2019-03-29 07:15:00.500000"]
    
    listvalues = [0.0,
                  0.614595,
                  0.678615,
                  0.687578,
                  0.559804,
                  0.522672,
                  0.51627,
                  0.51627,
                  0.517551,
                  0.51627,
                  0.509868,
                  0.513709,
                  0.513709,
                  0.518831,
                  0.531635]
    
    dt = timedelta(seconds=5)
    
    averagevalues = []
    time0 = convert(listtimes[0])
    time1 = time0 + dt
    counter = 0
    mysum = 0
    for i, v in enumerate(listvalues):
        if convert(listtimes[i]) >= time1:
            averagevalues.append(mysum / counter)
            counter = 0
            mysum = 0
            time1 += dt
    
        counter += 1
        mysum += v
    
    if counter != 0:
        averagevalues.append(mysum / counter)
    print(averagevalues)
    

    给出结果

    [0.5144918]
    

    因此,如果您有更大的值列表和更大的时间,列表averagevalues 将分组每个5 seconds 的平均值。在这个例子中,所有时间都在2019-03-29 07:14:56"2019-03-29 07:15:01 之间,所以我们在averagevalues 中只有一个值

    【讨论】:

      猜你喜欢
      • 2015-09-10
      • 1970-01-01
      • 1970-01-01
      • 2021-07-30
      • 1970-01-01
      • 2018-05-27
      • 2011-12-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多