【问题标题】:Pandas converting timestamp and monthly summary熊猫转换时间戳和每月摘要
【发布时间】:2018-10-23 10:06:52
【问题描述】:

我有几个通过 Pandas 导入的 .csv 文件,然后计算出数据摘要(最小值、最大值、平均值),最好是每周和每月报告。我有以下代码,但似乎无法使月份摘要起作用,我确定问题出在时间戳转换上。

我做错了什么?

import pandas as pd
import numpy as np

#Format of the data that is been imported
#2017-05-11 18:29:14+00:00,264.0,987.99,26.5,23.70,512.0,11.763,52.31

df = pd.read_csv('data.csv')
df['timestamp'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')

print 'month info'
print [g for n, g in df.groupby(pd.Grouper(key='timestamp',freq='M'))]
print(data.groupby('timestamp')['light'].mean())

【问题讨论】:

    标签: python pandas numpy time


    【解决方案1】:

    IIUC,您几乎拥有它,并且您的日期时间转换很好。这是一个例子:

    从这样的数据框开始(这是您的示例行,经过轻微修改重复):

    >>> df
                            time      x       y     z     a      b       c      d
    0  2017-05-11 18:29:14+00:00  264.0  947.99  24.5  53.7  511.0  11.463  12.31
    1  2017-05-15 18:29:14+00:00  265.0  957.99  25.5  43.7  512.0  11.563  22.31
    2  2017-05-21 18:29:14+00:00  266.0  967.99  26.5  33.7  513.0  11.663  32.31
    3  2017-06-11 18:29:14+00:00  267.0  977.99  26.5  23.7  514.0  11.763  42.31
    4  2017-06-22 18:29:14+00:00  268.0  997.99  27.5  13.7  515.0  11.800  52.31
    

    你可以用你的日期时间做你以前做过的事情:

    df['timestamp'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
    

    然后分别获取您的摘要:

    monthly_mean = df.groupby(pd.Grouper(key='timestamp',freq='M')).mean()
    monthly_max = df.groupby(pd.Grouper(key='timestamp',freq='M')).max()
    monthly_min = df.groupby(pd.Grouper(key='timestamp',freq='M')).min()
    
    weekly_mean = df.groupby(pd.Grouper(key='timestamp',freq='W')).mean()
    weekly_min = df.groupby(pd.Grouper(key='timestamp',freq='W')).min()
    weekly_max = df.groupby(pd.Grouper(key='timestamp',freq='W')).max()
    
    # Examples:
    >>> monthly_mean
                    x       y     z     a      b        c      d
    timestamp                                                   
    2017-05-31  265.0  957.99  25.5  43.7  512.0  11.5630  22.31
    2017-06-30  267.5  987.99  27.0  18.7  514.5  11.7815  47.31
    
    >>> weekly_mean
                    x       y     z     a      b       c      d
    timestamp                                                  
    2017-05-14  264.0  947.99  24.5  53.7  511.0  11.463  12.31
    2017-05-21  265.5  962.99  26.0  38.7  512.5  11.613  27.31
    2017-05-28    NaN     NaN   NaN   NaN    NaN     NaN    NaN
    2017-06-04    NaN     NaN   NaN   NaN    NaN     NaN    NaN
    2017-06-11  267.0  977.99  26.5  23.7  514.0  11.763  42.31
    2017-06-18    NaN     NaN   NaN   NaN    NaN     NaN    NaN
    2017-06-25  268.0  997.99  27.5  13.7  515.0  11.800  52.31
    

    或将它们全部聚合在一起以获得带有摘要的多索引数据框:

    monthly_summary = df.groupby(pd.Grouper(key='timestamp',freq='M')).agg(['mean', 'min', 'max'])
    weekly_summary = df.groupby(pd.Grouper(key='timestamp',freq='W')).agg(['mean', 'min', 'max'])
    
    # Example of summary of row 'x':
    >>> monthly_summary['x']
                 mean    min    max
    timestamp                      
    2017-05-31  265.0  264.0  266.0
    2017-06-30  267.5  267.0  268.0
    
    >>> weekly_summary['x']
                 mean    min    max
    timestamp                      
    2017-05-14  264.0  264.0  264.0
    2017-05-21  265.5  265.0  266.0
    2017-05-28    NaN    NaN    NaN
    2017-06-04    NaN    NaN    NaN
    2017-06-11  267.0  267.0  267.0
    2017-06-18    NaN    NaN    NaN
    2017-06-25  268.0  268.0  268.0
    

    【讨论】:

    • 太棒了,非常感谢您的快速响应并且完美运行......再次感谢
    猜你喜欢
    • 1970-01-01
    • 2017-12-12
    • 2020-04-16
    • 1970-01-01
    • 2014-10-28
    • 2016-03-27
    • 2016-06-27
    • 2021-12-06
    • 2019-04-26
    相关资源
    最近更新 更多