在拆分过程中更改训练和测试百分比的更好方法是什么？答案

【问题标题】：What is the better way to change the percentages of the training and the testing during the splitting process?在拆分过程中更改训练和测试百分比的更好方法是什么？
【发布时间】：2020-04-09 23:53:05
【问题描述】：

通过使用 PCA 技术和Yale database，我正在尝试通过将训练过程随机分成 20% 并将测试过程分成 80% 来在 Matlab 中进行人脸识别。它被赋予了一个

位置 2 的索引超出数组边界（不得超过 29）

错误。以下是代码，希望能得到帮助：

dataset = load('yale_FaceDataset.mat');

trainSz = round(dataset.samples*0.2);
testSz = round(dataset.samples*0.8);

trainSetCell = cell(1,trainSz*dataset.classes);
testSetCell = cell(1,testSz*dataset.classes);

j = 1;
k = 1;
m = 1;
for i = 1:dataset.classes
    % training set
    trainSetCell(k:k+trainSz-1) = dataset.images(j:j+trainSz-1);
    trainLabels(k:k+trainSz-1) = dataset.labels(j:j+trainSz-1);
    k = k+trainSz;
    % test set
    testSetCell(m:m+testSz-1) = dataset.images(j+trainSz:j+dataset.samples-1);
    testLabels(m:m+testSz-1) = dataset.labels(j+trainSz:j+dataset.samples-1);
    m = m+testSz;
    j = j+dataset.samples;
end
% convert the data from a cell into a matrix format
numImgs = length(trainSetCell);
trainSet = zeros(numImgs,numel(trainSetCell{1}));
for i = 1:numImgs
    trainSet(i,:) = reshape(trainSetCell{i},[],1);
end
numImgs = length(testSetCell);

testSet = zeros(numImgs,numel(testSetCell{1}));
for i = 1:numImgs
    testSet(i,:) = reshape(testSetCell{i},[],1);
end


%% applying PCA
% compute the mean face
mu = mean(trainSet)';

% centre the training data
trainSet = trainSet - (repmat(mu,1,size(trainSet,1)))';

% generate the eigenfaces(features of the training set)
eigenfaces = pca(trainSet);

% set the number of principal components
Ncomponents = 100;

% Out of the generated components, we keep "Ncomponents"
eigenfaces = eigenfaces(:,1:Ncomponents);

% generate training features
trainFeatures = eigenfaces' * trainSet';

% Subspace projection
% centre features
testSet = testSet - (repmat(mu,1,size(testSet,1)))';

% subspace projection
testFeatures = inv(eigenfaces'*eigenfaces) * eigenfaces' * testSet';

mdl = fitcdiscr(trainFeatures',trainLabels);
labels = predict(mdl,testFeatures');


% find the images that were recognised and their respect. labels
correctRec = find(testLabels == labels');
correctLabels = labels(correctRec);

% find the images that were NOT recognised and their respect. labels
falseRec = find(testLabels ~= labels');
falseLabels = labels(falseRec);


% compute and display the recognition rate
result = length(correctRec)/length(testLabels)*100;
fprintf('The recognition rate is: %0.3f \n',result);

% divide the images into : recognised and unrecognised
correctTest = testSetCell(correctRec);
falseTest = testSetCell(falseRec);

% display some recognised samples and their respective labels
imgshow(correctTest(1:8),correctLabels(1:8));

% display all unrecognised samples and their respective labels
imgshow(falseTest(1:length(falseTest)), falseLabels(1:length(falseTest)));

【问题讨论】：

标签： matlab pca

【解决方案1】：

如果您还提供错误的行号和完整消息，并且如果您将代码剥离到基本部分，那就太好了。我想，这里不需要 PCA 的东西，因为错误可能在你的循环中引发。那是因为您将j 增加j = j+dataset.samples; 并将其放入下一个循环集中以索引j:j+trainSz-1，现在必须超过dataset.samples...

尽管如此，索引中没有随机性。如果你使用内置的cvpartition-function 是最简单的：

% split data 
cvp = cvpartition(Lbl,'HoldOut',.2);
lgTrn = cvp.training;
lgTst = cvp.test;

您可以提供类的数量作为第一个输入（在这种情况下为Lbl）或实际的类向量，让cvpartition 选择反映各个类的原始分布的随机子集。

【讨论】：