【问题标题】:Statsmodels ARIMA date index frequencyStatsmodels ARIMA 日期索引频率
【发布时间】:2021-05-07 09:57:45
【问题描述】:

我有一个带有日期时间索引的 pandas 数据框,频率设置为“C” - 业务自定义:

ipdb>  data.index
DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
               '2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
               '2021-03-17', '2021-03-18',
               ...
               '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
               '2021-11-18', '2021-11-19'],
              dtype='datetime64[ns]', name='mktDates', length=180, freq='C')

索引是使用 pandas bdate_range 函数创建的

holidays = pd.read_csv('../data/raw/market_holidays.csv', parse_dates=True, infer_datetime_format=True)
holidays = pd.to_datetime(holidays['date_YYYY_MM_DD'], format='%Y-%m-%d')

sttDate = dat.datetime(2013, 1, 1)
stpDate = dat.datetime(2021, 12, 31)

# build the calendar
mktCalendar = pd.bdate_range(start=sttDate, end=stpDate, holidays=holidays.values, freq='C').rename('mktDates')

我正在尝试使用代码将 ARIMA 模型与 statsmodels 拟合:

import statsmodels.api as sm
thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')

最后一行抛出异常:

<ipython-input-392-acbc7f25591c> in ARIMASimulate(data, simParams, randSeed, verbose)
     27         # fit and get the score
     28         ipdb.set_trace()
---> 29         arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\model.py in __init__(self, endog, exog, order, seasonal_order, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
    107     >>> print(res.summary())
    108     """
--> 109     def __init__(self, endog, exog=None, order=(0, 0, 0),
    110                  seasonal_order=(0, 0, 0, 0), trend=None,
    111                  enforce_stationarity=True, enforce_invertibility=True,

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\specification.py in __init__(self, endog, exog, order, seasonal_order, ar_order, diff, ma_order, seasonal_ar_order, seasonal_diff, seasonal_ma_order, seasonal_periods, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
    444         # especially validating shapes, retrieving names, and potentially
    445         # providing us with a time series index
--> 446         self._model = TimeSeriesModel(endog, exog=exog, dates=dates, freq=freq,
    447                                       missing=missing)
    448         self.endog = None if faux_endog else self._model.endog

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in __init__(self, endog, exog, dates, freq, missing, **kwargs)
    413 
    414         # Date handling in indexes
--> 415         self._init_dates(dates, freq)
    416 
    417     def _init_dates(self, dates=None, freq=None):

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in _init_dates(self, dates, freq)
    555                 elif (freq is not None and not inferred_freq and
    556                         not (index.freq == freq)):
--> 557                     raise ValueError('The given frequency argument is'
    558                                      ' incompatible with the given index.')
    559             # Finally, raise an exception if we could not coerce to date-based

ValueError: The given frequency argument is incompatible with the given index

我不明白这一点,因为频率参数与数据索引的参数相同。我也知道索引没有按照频率丢失任何日期。我有 statsmodels 0.12.1。知道这里发生了什么吗?

【问题讨论】:

    标签: python python-3.x statsmodels arima


    【解决方案1】:

    尝试从 2021-03-05 到 2021-11-19 生成带有 freq='C' 的 DateTimeIndex,长度为 186。您的索引是 180,所以缺少 6 个日期

    import pandas as pd
    
    date_range = pd.date_range(
        start='2021-03-05',
        end='2021-11-19',
        freq='C'
    )
    
    print(date_range)
    
    DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
                   '2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
                   '2021-03-17', '2021-03-18',
                   ...
                   '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
                   '2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
                   '2021-11-18', '2021-11-19'],
                  dtype='datetime64[ns]', length=186, freq='C')
    

    将此date_range 与 ARIMA 一起使用,不会出错

    import numpy as np
    import statsmodels.api as sm
    
    x = np.linspace(0, 2*np.pi, date_range.size)
    y = np.sin(4*np.pi*x)
    
    data = pd.DataFrame({
        'Y': y,
    }, index=date_range)
    
    thisOrder = (1, 1, 1)
    arima = sm.tsa.arima.ARIMA(
        endog=data, order=thisOrder, 
        freq='C'
    )
    

    所以你可能需要检查你的 DataFrame 索引。

    【讨论】:

    • C 频率适用于“自定义工作日”。我用pandas.bdate_range() 创建了索引,将一组假期传递给holidays 参数。索引正确且与频率一致。
    • @Dr.Andrew 请发布代码以填充您使用的索引
    • 我已根据您的要求编辑了问题。
    猜你喜欢
    • 1970-01-01
    • 2021-05-22
    • 2016-02-13
    • 2021-03-01
    • 2017-05-28
    • 1970-01-01
    • 2014-01-18
    • 2019-08-26
    • 2021-07-03
    相关资源
    最近更新 更多