【问题标题】:Plot percentile of time series during business hours on weekdays在工作日的工作时间内绘制时间序列的百分位数
【发布时间】:2019-07-23 00:35:39
【问题描述】:

我有一个以 15 分钟为间隔的时间序列值数据框。我希望仅绘制工作日的营业时间(上午 8 点至下午 5 点)的数据百分位数。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'date': pd.date_range(start='2014-09-13', periods=70264, freq='15min'), 
                   'value': np.random.normal(100, 10, 70264)})
df.set_index('date', inplace = True)
df.head()
                        value
date    
2014-09-13 00:00:00 106.263264
2014-09-13 00:15:00 99.030542
2014-09-13 00:30:00 85.116465
2014-09-13 00:45:00 98.717306
2014-09-13 01:00:00 97.627103

我添加了一个工作日列,然后过滤了营业时间:

df['weekday'] = df.index.weekday
df = df[(df.index.hour >= 8) & (df.index.hour <= 17) & (df.index.weekday < 5)]

但我不确定如何计算和绘制每个工作日的每小时汇总数据(例如,平均值和 5%、50%、95% 的百分位数)。所需的结果是此汇总数据的图,但仅限于营业时间。

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    首先,采用您在答案中陈述的设置并添加一行以根据索引添加“小时”列,就像您为“工作日”所做的那样。

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({'date': pd.date_range(start='2014-09-13', periods=70264, freq='15min'),
                       'value': np.random.normal(100, 10, 70264)})
    df.set_index('date', inplace = True)
    df['weekday'] = df.index.weekday
    df['hour'] = df.index.hour
    df = df[(df.index.hour >= 8) & (df.index.hour <= 17) & (df.index.weekday < 5)]
    

    现在,以下代码计算剩余 40 个(工作日、小时)对中的每一对的均值、方差和 0.05/0.50/0.95 个百分位数。

    df_agg = df.groupby(['weekday', 'hour']).agg(['mean', 'var'])
    df_agg.rename({'value': 'aggregate'}, axis=1, inplace=True)
    df_pct = df.groupby(['weekday', 'hour']).quantile(q=[0.05, 0.50, 0.95])
    df_pct.rename({'value': 'percentile'}, axis=1, inplace=True)
    df_pct = df_pct.unstack(level=2)
    df_all = df_pct.join(df_agg)
    

    最后,设置一个 5×1 的绘图数组,一周中每天一个绘图。在每个图上,绘制了当天的每小时聚合度量。并打印 DataFrame 以验证结果。

    (请参阅底部图例的替代样式。)

    print(df_all)
    fig, axes = plt.subplots(5, 1)
    for idx, (weekday, df_day) in enumerate(df_all.groupby(by='weekday')):
        df_day.plot(ax=axes[idx])
        axes[idx].get_legend().set_title(None)
    fig.suptitle('Hourly Aggregate Measures for each Weekday')
    plt.show()
    

    文本输出:

                 percentile                           aggregate
                       0.05         0.5        0.95        mean         var
    weekday hour
    0       8     84.804628  100.518400  117.046405  100.526630   93.518537
            9     83.986630   99.151391  116.475380   99.727505  106.065423
            10    84.210861   99.816146  118.514653  100.642808  108.544615
            11    81.917007   99.047425  114.454161   99.225104   95.743352
            12    83.530758   99.655185  117.473294  100.029289  107.700859
            13    83.508926  100.198648  117.106325  100.126363  105.463587
            14    85.588650   99.900185  115.606768   99.873019   90.036859
            15    82.824524   99.365516  116.187813   99.416086  100.461026
            16    84.710711  100.175760  115.968933  100.474313   96.427756
            17    84.809259   99.406430  116.599022  100.171827   90.978999
    1       8     83.363705  100.346545  118.444590  100.163926  115.177386
            9     83.588517  100.477539  114.809687   99.655191   94.024155
            10    84.021888  100.327049  119.945789  100.615747  112.523785
            11    84.747342  100.148536  118.155315  100.938358  106.069580
            12    84.163070   99.686375  116.169788   99.817252  104.941025
            13    84.386216  100.535683  118.458226  100.694017  113.824358
            14    84.813543  100.076916  116.243032  100.164123   97.287727
            15    83.382711   99.904947  115.649935  100.037705   98.935296
            16    83.036918  100.309381  116.316698   99.958069   97.126112
            17    84.297125  101.294478  118.256736  101.217911  106.943089
    2       8     81.633494   99.433678  115.717984   99.266008  102.165153
            9     84.267210  101.169719  116.944396  100.919547   95.728475
            10    84.885450   99.875980  116.368479   99.956622   92.128995
            11    83.327970   98.636495  116.336673   99.109689   99.083896
            12    83.596938   99.590576  115.015071   99.442666   90.258090
            13    83.488958   98.715791  114.588427   99.067826   98.037157
            14    83.472710  100.715736  115.561818  100.347098   98.337901
            15    83.501371  100.162951  116.190391  100.102750  103.833767
            16    82.689447   99.621548  114.704916   99.061170   92.477417
            17    83.491864  100.503890  115.089975   99.313486  100.221236
    3       8     83.918757   99.862253  115.608802  100.065792   97.780389
            9     83.528116   99.699197  116.056878  100.078317   93.125548
            10    84.137936  100.300088  116.781452  100.499863   95.724861
            11    81.812646   99.848557  116.012410   99.605767  105.795234
            12    83.774116  100.925231  115.396749  100.326548   93.231116
            13    85.322574  100.243043  117.375949  100.634801   93.869458
            14    85.185780  100.486165  117.021391  100.343172  100.840142
            15    84.032386  100.166646  117.248322  100.164207  106.714328
            16    81.910123  100.004419  115.865071  100.006264  106.098148
            17    83.839222  100.208931  115.931519  100.246440   91.956736
    4       8     84.403681  101.088262  117.734961  100.496362  105.757660
            9     84.602218  100.317946  116.859810  100.310827   93.845486
            10    84.224072  100.750667  117.313116  100.874683  100.350910
            11    79.256784   99.046019  114.153569   98.173933  107.630724
            12    85.650756  100.567063  117.374603  101.069566   91.156081
            13    84.159938   99.788830  116.811645   99.943816  100.655303
            14    85.053258  100.056065  116.872187  100.418592   97.690391
            15    82.826035   99.739967  116.562845   99.590234  108.127479
            16    83.702962   99.458986  117.341467  100.080913  104.140598
            17    83.012213  100.143797  115.448508   99.854219   98.196456
    

    绘图输出: 由于所有子图的图例都相同,因此您可能希望整个图只有一个图例。为此,请改用此绘图代码。

    fig, axes = plt.subplots(5, 1)
    for idx, (weekday, df_day) in enumerate(df_all.groupby(by='weekday')):
        lines = df_day.plot(ax=axes[idx], legend=None)
    fig.suptitle('Hourly Aggregate Measures for each Weekday')
    fig.legend(lines, labels=df_day.columns.tolist(), loc='right')
    plt.show()
    

    从技术上讲,图例与最后一个情节中的 lines 相关联,但只要所有子情节都具有相同的线条,这就是无关紧要的。您可以使用标准的 matplotlib 方法进一步操作 figaxeslegend

    【讨论】:

    • 抱歉,我需要每个工作日的营业时间图,即每个工作日的平均配置文件
    • 谢谢!有没有办法改变图例标题中的 (None, None) 并放置一个全局图形标题?
    • @IshanSaraswat 很高兴,我再次更新了我的回复以解决图例和图形标题。
    猜你喜欢
    • 2019-11-02
    • 1970-01-01
    • 1970-01-01
    • 2021-10-22
    • 2011-03-19
    • 1970-01-01
    • 2018-01-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多