【问题标题】:Convert monthly data table to seasonal time series using pandas使用 pandas 将月度数据表转换为季节性时间序列
【发布时间】:2019-11-20 17:11:55
【问题描述】:

我有一些表格格式的数据,其中行是年,列是月。我想将其转换为 pandas 中的时间序列格式,然后按季节平均值对数据进行分组(我特别想将冬季定义为 11 月至 3 月,并丢弃其他所有内容)。这是我将数据读入表格格式的代码,因此您可以看到发生了什么:

import pandas as pd
headers = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data',
                   delimiter='\s+', header=0, names=headers, skiprows=1, index_col=0)
df = df.drop(df.tail(7).index)
df
         Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     Oct     Nov     Dec
1871    -0.25   -0.58   -0.43   -0.50   -0.70   -0.53   -0.60   -0.33   -0.24   -0.33   -0.31   -0.58
1872    -0.72   -0.62   -0.50   -0.77   -0.62   -0.52   -0.32   -0.85   -1.02   -0.94   -0.79   -0.88
1873    -0.78   -1.01   -1.31   -0.67   -0.53   -0.48   -0.58   -0.39   -0.34   -0.78   -0.77   -0.70
1874    -0.93   -1.06   -1.40   -0.94   -0.86   -0.72   -1.00   -1.05   -1.13   -1.25   -1.33   -1.14
1875    -0.71   -0.37   -0.59   -0.87   -1.09   -0.76   -0.85   -0.81   -0.91   -0.83   -0.64   -0.75
1876    -0.95   -1.20   -1.13   -1.18   -1.08   -0.43   -0.34   -0.16   -0.02   0.11    0.15    0.23
1877    0.35    0.46    0.52    0.50    0.76    0.98    1.42    1.54    1.75    1.95    2.08    2.49
1878    2.41    2.43    1.31    0.92    0.82    0.92    0.25    -0.11   -0.32   -0.53   -0.70   -0.75
1879    -0.55   -0.18   -0.24   -0.37   -0.83   -0.67   -0.77   -0.69   -0.83   -0.93   -1.14   -1.02
1880    -1.00   -0.73   -0.62   -0.57   -0.71   -0.61   -0.53   -0.24   -0.03   0.17    0.24    0.18
1881    0.29    0.23    0.32    0.41    0.16    0.23    -0.26   -0.17   -0.33   -0.43   -0.59   -0.37
1882    -0.45   -0.55   -0.53   0.10    0.01    -0.40   -0.57   -0.33   -0.51   -0.65   -0.92   -0.75

我想要的结果是每个冬季的 11 月至 3 月平均值的时间序列(例如,2018 年 11 月、2018 年 12 月、2019 年 1 月、2019 年 2 月和 2019 年 3 月的平均值)。我猜这个过程包括首先将这些数据分类成一个时间序列(我不确定如何做),然后按季节(11 月至 3 月)分组,然后放弃其他月份。

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    给定数据如图所示

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # reset index
    df.reset_index(inplace=True)
    
    # rename the column to year
    df.rename(columns={'index': 'year'}, inplace=True)
    
    # melt df into a long format
    df2 = df.melt(id_vars='year', var_name='month')
    
    # create a datatime column and set it as index
    df2['date'] = pd.to_datetime(df2['year'].astype(str) + '-' + df2['month'])
    df2.set_index('date', inplace=True)
    
    # define a winter dataframe
    winter = df2[df2['month'].isin(['Nov', 'Dec', 'Jan', 'Feb', 'Mar'])].copy()
    
    # sort the index
    winter.sort_index(inplace=True)
    

    冬季数据帧

                year month  value
    date                         
    1871-01-01  1871   Jan  -0.25
    1871-02-01  1871   Feb  -0.58
    1871-03-01  1871   Mar  -0.43
    1871-11-01  1871   Nov  -0.31
    1871-12-01  1871   Dec  -0.58
    1872-01-01  1872   Jan  -0.72
    1872-02-01  1872   Feb  -0.62
    1872-03-01  1872   Mar  -0.50
    1872-11-01  1872   Nov  -0.79
    1872-12-01  1872   Dec  -0.88
    1873-01-01  1873   Jan  -0.78
    1873-02-01  1873   Feb  -1.01
    1873-03-01  1873   Mar  -1.31
    1873-11-01  1873   Nov  -0.77
    1873-12-01  1873   Dec  -0.70
    1874-01-01  1874   Jan  -0.93
    1874-02-01  1874   Feb  -1.06
    1874-03-01  1874   Mar  -1.40
    1874-11-01  1874   Nov  -1.33
    1874-12-01  1874   Dec  -1.14
    1875-01-01  1875   Jan  -0.71
    1875-02-01  1875   Feb  -0.37
    1875-03-01  1875   Mar  -0.59
    1875-11-01  1875   Nov  -0.64
    1875-12-01  1875   Dec  -0.75
    1876-01-01  1876   Jan  -0.95
    1876-02-01  1876   Feb  -1.20
    1876-03-01  1876   Mar  -1.13
    1876-11-01  1876   Nov   0.15
    1876-12-01  1876   Dec   0.23
    1877-01-01  1877   Jan   0.35
    1877-02-01  1877   Feb   0.46
    1877-03-01  1877   Mar   0.52
    1877-11-01  1877   Nov   2.08
    1877-12-01  1877   Dec   2.49
    1878-01-01  1878   Jan   2.41
    1878-02-01  1878   Feb   2.43
    1878-03-01  1878   Mar   1.31
    1878-11-01  1878   Nov  -0.70
    1878-12-01  1878   Dec  -0.75
    1879-01-01  1879   Jan  -0.55
    1879-02-01  1879   Feb  -0.18
    1879-03-01  1879   Mar  -0.24
    1879-11-01  1879   Nov  -1.14
    1879-12-01  1879   Dec  -1.02
    1880-01-01  1880   Jan  -1.00
    1880-02-01  1880   Feb  -0.73
    1880-03-01  1880   Mar  -0.62
    1880-11-01  1880   Nov   0.24
    1880-12-01  1880   Dec   0.18
    1881-01-01  1881   Jan   0.29
    1881-02-01  1881   Feb   0.23
    1881-03-01  1881   Mar   0.32
    1881-11-01  1881   Nov  -0.59
    1881-12-01  1881   Dec  -0.37
    1882-01-01  1882   Jan  -0.45
    1882-02-01  1882   Feb  -0.55
    1882-03-01  1882   Mar  -0.53
    1882-11-01  1882   Nov  -0.92
    1882-12-01  1882   Dec  -0.75
    
    • 在此示例中,1870 年和 1882 年的冬天是不完整的,分别由 3 个月和 2 个月组成。
      • 这 5 个日期将被删除
    • 计算每个季节的平均值
    # drop incomplete season at the beginning and end of the winter dataframe
    winter.drop([pd.Timestamp('1871-01-01'),
                 pd.Timestamp('1871-02-01'),
                 pd.Timestamp('1871-03-01'),
                 pd.Timestamp('1882-11-01'),
                 pd.Timestamp('1882-12-01')], inplace=True)
    
    # compute mean of season, every 5 rows
    yearly_avg = winter.groupby(winter.reset_index().index // 5)['value'].mean()
    
    # create label for season start data
    winter_year = pd.date_range(start='1871-11-01', periods=11, freq='12MS')
    
    # create winter_avg dataframe
    winter_avg = pd.DataFrame({'winter_start': winter_year, 'season_avg': yearly_avg})
    winter_avg.set_index('winter_start', inplace=True)
    
                  season_avg
    winter_start            
    1871-11-01        -0.546
    1872-11-01        -0.954
    1873-11-01        -0.972
    1874-11-01        -0.828
    1875-11-01        -0.934
    1876-11-01         0.342
    1877-11-01         2.144
    1878-11-01        -0.484
    1879-11-01        -0.902
    1880-11-01         0.252
    1881-11-01        -0.498
    

    绘制数据:

    # prevent future warning
    from pandas.plotting import register_matplotlib_converters
    register_matplotlib_converters()
    
    sns.lineplot(winter.index, winter['value'], label='all values')
    sns.scatterplot(winter_avg.index, winter_avg['season_avg'], label='season avg', color='orange')
    plt.ylabel('value')
    plt.xlabel('year')
    plt.show()
    

    【讨论】:

      【解决方案2】:

      你也可以通过以下方法实现。

      months = ['Jan' , 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
      seasons = 'Nov-Mar'.split('-')
      start = months.index(seasons[0])
      end = months.index(seasons[1])
      season_months1 = []
      season_months2 = []
      if end >= start:
          season_months1 = months[start:end+1]
          df['result'] = df[season_months1].mean(axis=1)
      if end < start:
          season_months1 = months[start:12]
          season_months2 = months[0:end+1]
          df2 = df[season_months2 + ['Year']]
          df2['Year'] = df2['Year'] + 1
          df3 = df2.merge(df[season_months1 + ['Year']], how='left')
          df3['result'] = df3.mean(axis=1)
          df = df.merge(df3[['Year', 'result']], how='left', on='Year')
      df
      

      它会给出如下结果

          Year    Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     Oct     Nov     Dec     result
      0   1871    -0.25   -0.58   -0.43   -0.50   -0.70   -0.53   -0.60   -0.33   -0.24   -0.33   -0.31   -0.58   NaN
      1   1872    -0.72   -0.62   -0.50   -0.77   -0.62   -0.52   -0.32   -0.85   -1.02   -0.94   -0.79   -0.88   311.511667
      2   1873    -0.78   -1.01   -1.31   -0.67   -0.53   -0.48   -0.58   -0.39   -0.34   -0.78   -0.77   -0.70   311.615000
      3   1874    -0.93   -1.06   -1.40   -0.94   -0.86   -0.72   -1.00   -1.05   -1.13   -1.25   -1.33   -1.14   311.405000
      4   1875    -0.71   -0.37   -0.59   -0.87   -1.09   -0.76   -0.85   -0.81   -0.91   -0.83   -0.64   -0.75   311.703333
      5   1876    -0.95   -1.20   -1.13   -1.18   -1.08   -0.43   -0.34   -0.16   -0.02   0.11    0.15    0.23    312.451667
      6   1877    0.35    0.46    0.52    0.50    0.76    0.98    1.42    1.54    1.75    1.95    2.08    2.49    313.048333
      7   1878    2.41    2.43    1.31    0.92    0.82    0.92    0.25    -0.11   -0.32   -0.53   -0.70   -0.75   312.980000
      8   1879    -0.55   -0.18   -0.24   -0.37   -0.83   -0.67   -0.77   -0.69   -0.83   -0.93   -1.14   -1.02   313.831667
      9   1880    -1.00   -0.73   -0.62   -0.57   -0.71   -0.61   -0.53   -0.24   -0.03   0.17    0.24    0.18    313.241667
      10  1881    0.29    0.23    0.32    0.41    0.16    0.23    -0.26   -0.17   -0.33   -0.43   -0.59   -0.37   312.948333
      11  1882    -0.45   -0.55   -0.53   0.10    0.01    -0.40   -0.57   -0.33   -0.51   -0.65   -0.92   -0.75   313.528333
      

      【讨论】:

        猜你喜欢
        • 2020-08-28
        • 2018-08-30
        • 1970-01-01
        • 1970-01-01
        • 2021-01-24
        • 2014-07-03
        • 2016-11-16
        • 1970-01-01
        • 2021-12-08
        相关资源
        最近更新 更多