既然你提到了:
我想知道如何使用基于统计的回归(通过定义任何函数)得出最适合我的数据的方法,并检查平方和以比较各种模型并选择最适合我的数据的一个。我应该提一下,我不是在寻找依赖于训练/测试数据的基于学习的回归。
也许ARIMA(自动回归集成移动平均线)模型具有给定设置(P、D、Q),它可以学习历史和predict()/forecast()。请注意,将数据拆分为训练和测试是为了使用前向验证方法进行评估:
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt
# load dataset
def parser(x):
return datetime.strptime('190'+x, '%Y-%m')
series = read_csv('/content/shampoo.txt', header=0, index_col=0, parse_dates=True, squeeze=True, date_parser=parser)
series.index = series.index.to_period('M')
# split into train and test sets
X = series.values
size = int(len(X) * 0.66)
train, test = X[0:size], X[size:len(X)]
history = [x for x in train]
predictions = list()
# walk-forward validation
for t in range(len(test)):
model = ARIMA(history, order=(5,1,0))
model_fit = model.fit()
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test[t]
history.append(obs)
print('predicted=%f, expected=%f' % (yhat, obs))
# evaluate forecasts
rmse = sqrt(mean_squared_error(test, predictions))
rmse_ = 'Test RMSE: %.3f' % rmse
# plot forecasts against actual outcomes
pyplot.plot(test, label='test')
pyplot.plot(predictions, color='red', label='predict')
pyplot.xlabel('Months')
pyplot.ylabel('Sale')
pyplot.title(f'ARIMA model performance with {rmse_}')
pyplot.legend()
pyplot.show()
我使用了与您提到的相同的库包以及以下输出,包括 均方根误差 (RMSE) 评估:
import statsmodels as sm
sm.__version__ # '0.10.2'
请参阅其他post1 和post2 了解更多信息。或许你也可以加trend line