【问题标题】:10-fold cross validation for polynomial regressions多项式回归的 10 倍交叉验证
【发布时间】:2015-01-11 14:48:47
【问题描述】:

我想使用 10 折交叉验证方法,它测试哪种多项式形式(第一、第二或 三阶)给出了更好的拟合。我想将我的数据集分成 10 个子集,并从 10 个数据集中删除 1 个子集。导出没有该子集的回归模型,使用导出的回归模型预测该子集的输出值,并计算残差。最后对每个子集重复计算例程,并对结果残差的平方求和。 我已经在 Matlab 2013b 上编写了以下代码,它对数据进行采样并测试训练数据的回归。我被困在如何对每个子集重复此操作以及如何比较哪种多项式形式更适合。

% Sample the data
parm = [AT];
n = length(parm);
k = 10;                 % how many parts to use
allix = randperm(n);    % all data indices, randomly ordered
numineach = ceil(n/k);  % at least one part must have this many data points
allix = reshape([allix NaN(1,k*numineach-n)],k,numineach);
for p=1:k
testix = allix(p,:);            % indices to use for testing
testix(isnan(testix)) = [];     % remove NaNs if necessary
trainix = setdiff(1:n,testix);  % indices to use for training
%train = parm(trainix); %gives the training data
%test = parm(testix);  %gives the testing data
end 

% Derive regression on the training data 
Sal = Salinity(trainix);
Temp = Temperature(trainix);
At = parm(trainix);

xyz =[Sal Temp At];
% Fit a Polynomial Surface
surffit = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
% Shows equation, rsquare, rmse 
[b,bint,r] = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');

【问题讨论】:

    标签: matlab polynomial-math cross-validation


    【解决方案1】:

    关于为每个子集执行代码,您可以将 fit 放入循环中并存储结果,例如

    % Sample the data
    parm = [AT];
    n = length(parm);
    k = 10;                 % how many parts to use
    allix = randperm(n);    % all data indices, randomly ordered
    numineach = ceil(n/k);  % at least one part must have this many data points
    allix = reshape([allix NaN(1,k*numineach-n)],k,numineach);
    
    bAll = []; bintAll = []; rAll = [];
    
    for p=1:k
        testix = allix(p,:);            % indices to use for testing
        testix(isnan(testix)) = [];     % remove NaNs if necessary
        trainix = setdiff(1:n,testix);  % indices to use for training
        %train = parm(trainix); %gives the training data
        %test = parm(testix);  %gives the testing data
    
        % Derive regression on the training data 
        Sal = Salinity(trainix);
        Temp = Temperature(trainix);
        At = parm(trainix);
    
        xyz =[Sal Temp At];
        % Fit a Polynomial Surface
        surffit = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
        % Shows equation, rsquare, rmse 
        [b,bint,r] = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
    
        bAll = [bAll, coeffvalues(b)]; bintAll = [bintAll,bint]; rAll = [rAll,r]; 
    end 
    

    关于最佳拟合,您可能可以选择具有最低 rmse 的拟合。

    【讨论】:

    • 使用 fittype/horzcat 时出错(第 7 行)不允许串联双对象。
    • 哦,我明白了。从 b,您可能只需要保存系数。如果是,您可以尝试使用 bAll = [bAll, coeffvalues(b)] 而不是 bAll = [bAll, b]。抱歉,由于没有安装曲线拟合工具箱,无法自行测试
    猜你喜欢
    • 2016-09-10
    • 2016-05-30
    • 1970-01-01
    • 2018-09-12
    • 2012-12-11
    • 2016-09-30
    • 2016-06-27
    • 2017-11-12
    • 1970-01-01
    相关资源
    最近更新 更多