【发布时间】:2021-05-07 09:57:45
【问题描述】:
我有一个带有日期时间索引的 pandas 数据框,频率设置为“C” - 业务自定义:
ipdb> data.index
DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
'2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
'2021-03-17', '2021-03-18',
...
'2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
'2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
'2021-11-18', '2021-11-19'],
dtype='datetime64[ns]', name='mktDates', length=180, freq='C')
索引是使用 pandas bdate_range 函数创建的
holidays = pd.read_csv('../data/raw/market_holidays.csv', parse_dates=True, infer_datetime_format=True)
holidays = pd.to_datetime(holidays['date_YYYY_MM_DD'], format='%Y-%m-%d')
sttDate = dat.datetime(2013, 1, 1)
stpDate = dat.datetime(2021, 12, 31)
# build the calendar
mktCalendar = pd.bdate_range(start=sttDate, end=stpDate, holidays=holidays.values, freq='C').rename('mktDates')
我正在尝试使用代码将 ARIMA 模型与 statsmodels 拟合:
import statsmodels.api as sm
thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')
最后一行抛出异常:
<ipython-input-392-acbc7f25591c> in ARIMASimulate(data, simParams, randSeed, verbose)
27 # fit and get the score
28 ipdb.set_trace()
---> 29 arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\model.py in __init__(self, endog, exog, order, seasonal_order, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
107 >>> print(res.summary())
108 """
--> 109 def __init__(self, endog, exog=None, order=(0, 0, 0),
110 seasonal_order=(0, 0, 0, 0), trend=None,
111 enforce_stationarity=True, enforce_invertibility=True,
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\specification.py in __init__(self, endog, exog, order, seasonal_order, ar_order, diff, ma_order, seasonal_ar_order, seasonal_diff, seasonal_ma_order, seasonal_periods, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
444 # especially validating shapes, retrieving names, and potentially
445 # providing us with a time series index
--> 446 self._model = TimeSeriesModel(endog, exog=exog, dates=dates, freq=freq,
447 missing=missing)
448 self.endog = None if faux_endog else self._model.endog
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in __init__(self, endog, exog, dates, freq, missing, **kwargs)
413
414 # Date handling in indexes
--> 415 self._init_dates(dates, freq)
416
417 def _init_dates(self, dates=None, freq=None):
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in _init_dates(self, dates, freq)
555 elif (freq is not None and not inferred_freq and
556 not (index.freq == freq)):
--> 557 raise ValueError('The given frequency argument is'
558 ' incompatible with the given index.')
559 # Finally, raise an exception if we could not coerce to date-based
ValueError: The given frequency argument is incompatible with the given index
我不明白这一点,因为频率参数与数据索引的参数相同。我也知道索引没有按照频率丢失任何日期。我有 statsmodels 0.12.1。知道这里发生了什么吗?
【问题讨论】:
标签: python python-3.x statsmodels arima