【问题标题】:Gaussian Process Regression高斯过程回归
【发布时间】:2015-09-10 23:39:53
【问题描述】:

我正在编写一个高斯过程回归算法。代码如下:

% Data generating function

fh = @(x)(2*cos(2*pi*x/10).*x);

% range

x = -5:0.01:5;
N = length(x);

% Sampled data points from the generating function

M = 50;
selection = boolean(zeros(N,1));
j = randsample(N, M);

% mark them

selection(j) = 1;
Xa = x(j);

% compute the function and extract mean

f = fh(Xa) - mean(fh(Xa));
sigma2 = 1;

% computing the interpolation using all x's
% It is expected that for points used to build the GP cov. matrix, the
% uncertainty is reduced...

K = squareform(pdist(x'));
K = exp(-(0.5*K.^2)/sigma2);

% upper left corner of K

Kaa = K(selection,selection);

% lower right corner of K

Kbb = K(~selection,~selection);

% upper right corner of K

Kab = K(selection,~selection);

% mean of posterior

m = Kab'*inv(Kaa+0.001*eye(M))*f';

% cov. matrix of posterior

D = Kbb - Kab'*inv(Kaa + 0.001*eye(M))*Kab;

% sampling M functions from from GP

[A,B,C] = svd(Kaa);
F0 = A*sqrt(B)*randn(M,M);
% mean from GP using sampled points

F0m = mean(F0,2);
F0d = std(F0,0,2);

%%
% put together data and estimation

F = zeros(N,1);
S = zeros(N,1);
F(selection) = f' + F0m;
S(selection) = F0d;

% sampling M function from posterior

[A,B,C] = svd(D);
a = A*sqrt(B)*randn(N-M,M);
% mean from posterior GPs

Fm = m + mean(a,2);
Fmd = std(a,0,2);
F(~selection) = Fm;
S(~selection) = Fmd;

%%

figure;
% show what we got...

plot(x, F, ':r', x, F-2*S, ':b', x, F+2*S, ':b'), grid on;
hold on;
% show points we got

plot(Xa, f, 'Ok');
% show the whole curve

plot(x, fh(x)-mean(fh(x)), 'k');
grid on;

我希望得到一些不错的数字,其中未知数据点的不确定性很大,而采样数据点周围的不确定性很小。我得到了一个奇怪的数字,更奇怪的是采样数据点周围的不确定性比其他数据点大。有人可以向我解释我做错了什么吗?谢谢!!

【问题讨论】:

    标签: matlab regression gaussian


    【解决方案1】:

    您的代码有一些问题。以下是最重要的几点:

    • 导致一切出错的主要错误是对f 的索引。您正在定义Xa = x(j),但实际上应该执行Xa = x(selection),以便索引与您在内核矩阵K 上使用的索引一致。

    • 减去样本均值 f = fh(Xa) - mean(fh(Xa)) 没有任何作用,并且会使图中的圆圈偏离实际函数。 (如果选择减法,应该是固定的数或函数,不依赖于随机抽样的观测值。)

    • 您应该直接从mD 计算后验均值和方差;无需从后验样本中获取样本估计值。

    这是修正了以上几点的脚本的修改版本。

    %% Init
    % Data generating function
    fh = @(x)(2*cos(2*pi*x/10).*x);
    % range
    x = -5:0.01:5;
    N = length(x);
    % Sampled data points from the generating function
    M = 5;
    selection = boolean(zeros(N,1));
    j = randsample(N, M);
    % mark them
    selection(j) = 1;
    Xa = x(selection);
    
    %% GP computations
    % compute the function and extract mean
    f = fh(Xa);
    sigma2 = 2;
    sigma_noise = 0.01;
    var_kernel = 10;
    % computing the interpolation using all x's
    % It is expected that for points used to build the GP cov. matrix, the
    % uncertainty is reduced...
    K = squareform(pdist(x'));
    K = var_kernel*exp(-(0.5*K.^2)/sigma2);
    % upper left corner of K
    Kaa = K(selection,selection);
    % lower right corner of K
    Kbb = K(~selection,~selection);
    % upper right corner of K
    Kab = K(selection,~selection);
    % mean of posterior
    m = Kab'/(Kaa + sigma_noise*eye(M))*f';
    % cov. matrix of posterior
    D = Kbb - Kab'/(Kaa + sigma_noise*eye(M))*Kab;
    
    %% Plot
    figure;
    grid on;
    hold on;
    % GP estimates
    plot(x(~selection), m);
    plot(x(~selection), m + 2*sqrt(diag(D)), 'g-');
    plot(x(~selection), m - 2*sqrt(diag(D)), 'g-');
    % Observations
    plot(Xa, f, 'Ok');
    % True function
    plot(x, fh(x), 'k');
    

    由此得出的图,其中包含 5 个随机选择的观察值,其中真实函数以黑色显示,后验均值以蓝色显示,置信区间以绿色显示。

    【讨论】:

    • 3lectrologos,谢谢你的回答!!我从文献中了解到,在构建 GP 模型之前数据应该居中,因此我减去了 mean(f(Xa)),因为 Xa 代表点,而 f(Xa) 代表你可以从一些测量中得到。其他 cmets 表明我需要了解更多。再次感谢!!
    • 经过再三考虑,减去样本均值是可以的,但最后你应该将它再次添加到数据点和 GP 估计中。 (如果您想迭代地添加训练观察并多次估计后验,这只会是一个问题。)
    猜你喜欢
    • 2018-12-06
    • 2016-09-29
    • 1970-01-01
    • 2020-07-16
    • 2021-03-08
    • 1970-01-01
    • 2021-10-07
    • 2021-05-01
    相关资源
    最近更新 更多