【发布时间】:2013-03-04 20:37:56
【问题描述】:
我在数组 中有一个大型数据集。数据量真的很大,这些是一些单元格尺寸 - 5 个是 ,11 个是
我正在尝试执行我有功能的重采样操作。 (功能代码如下所示)。我正在尝试从数组中取出整个单元格,执行重采样操作并将结果存储回相同的数组位置或不同的位置。
但是,我在第 19 行或 Resample 函数中收到以下错误 -
“使用零时出错 超过了程序允许的最大变量大小。 重采样错误(第 19 行) obj = zeros(t,1);
我在评论第 19 行时遇到内存不足错误。
请问有没有更有效的方法来处理如此庞大的数据集?
谢谢。
实际代码:
%% To load each ".dat" file for the 51 attributes to an array.
a = dir('*.dat');
for i = 1:length(a)
eval(['load ' a(i).name ' -ascii']);
end
attributes = length(a);
% Scan folder for number of ".dat" files
datfiles = dir('*.dat');
% Count Number of ".dat" files
numfiles = length(datfiles);
% Read files in to MATLAB
for i = 1:1:numfiles
A{i} = csvread(datfiles(i).name);
end
% Remove discarded variables
ind = [1 22 23 24 25 26 27 32]; % Variables to be removed.
A(ind) = [];
% Reshape all the data into columns - (n x 1)
for i = 1:1:length(A)
temp = A{1,i};
[x,y] = size(temp);
if x == 1 && y ~= 1
temp = temp';
A{1,i} = temp;
end
end
% Retrieves the frequency data for the attributes from Excel spreadsheet
frequency = xlsread('C:\Users\aajwgc\Documents\MATLAB\Research Work\Data\testBig\frequency');
% Removing recorded frequency for discarded variables
frequency(ind) = [];
% Upsampling all the attributes to desired frequency
prompt = {'Frequency (Hz):'};
dlg_title = 'Enter desired output frequency for all attributes';
num_lines = 1;
def = {'50'};
answer= inputdlg(prompt,dlg_title,num_lines,def);
OutFreq = str2num(answer{1});
m = 1;
n = length(frequency);
A_resampled = cell(m,n);
A_resampled(:) = {''};
for i = length(frequency);
raw = cell2mat(A(1,i));
temp= Resample(raw, frequency(i,:), OutFreq);
A_resampled{i} = temp(i);
end
重采样函数:
function obj = Resample(InputData, InFreq, OutFreq, varargin)
%% Preliminary setup
% Allow for selective down-sizing by specifying type
type = 'mean'; %default to the mean/average
if size(varargin,2) > 0
type = varargin{1};
end
% Determine the necessary resampling factor
factor = OutFreq / InFreq;
%% No refactoring required
if (factor == 1)
obj = InputData;
%% Up-Sampling required
elseif (factor > 1)
t = factor * numel(InputData(1:end));
**obj = zeros(t,1); ----------------> Line 19 where I get the error message.**
for i = 1:factor:t
y = ((i-1) / factor) + 1;
z = InputData(y);
obj(i:i+factor) = z;
end
%% Down-Sampling required
elseif (factor < 1)
t = numel(InputData(1:end));
t = floor(t * factor);
obj = zeros(t,1);
factor = int32(1/factor);
if strcmp(type,'mean') %default is mean (process first)
for i = 1:t
y = (factor * (i-1)) + 1;
obj(i) = mean(InputData(y:y+factor-1));
end
elseif strcmp(type,'min')
for i = 1:t
y = (factor * (i-1)) + 1;
obj(i) = min(InputData(y:y+factor-1));
end
elseif strcmp(type,'max')
for i = 1:t
y = (factor * (i-1)) + 1;
obj(i) = max(InputData(y:y+factor-1));
end
elseif strcmp(type,'mode')
for i = 1:t
y = (factor * (i-1)) + 1;
obj(i) = mode(InputData(y:y+factor-1));
end
elseif strcmp(type,'sum')
for i = 1:t
y = (factor * (i-1)) + 1;
obj(i) = sum(InputData(y:y+factor-1));
end
elseif strcmp(type,'single')
for i = 1:t
y = (factor * (i-1)) + 1;
obj(i) = InputData(y);
end
else
obj = NaN;
end
else
obj = NaN;
end
【问题讨论】:
-
您似乎正在传递一个单元格进行重新采样。因此,即使 t 的值等于最大单元大小 = ,也意味着 10.5 MB 内存。你不应该得到这样的错误。当您收到错误时,您的情况下 t 的值是多少?还可以通过在 MATLAB 的命令行中键入 memory 来检查内存使用情况。
-
Parag, t 评估重采样操作将产生的元素数量。它没有超过第一次迭代,t 是 45875200。
-
然后
zeros(t,1)将产生~350MB 内存。在 MATAB 命令窗口中输入memory,看看你是否有那么多内存。另一种选择是,如果您要在obj中存储无符号的 8 位整数,您可以写为zeros(t,1,'uint8')或zeros(t,1,'single')。什么都适合你。 -
我有大约 2910MB 我想知道为什么它不起作用。另外,我在使用 csvread 读取数据时尝试避免使用 eval 函数。但是 csvread 读取文件 a01 并移动到文件 a20 而不是 a01、a02、a03... 我该如何解决这个问题?数据在“.dat”中提供
标签: arrays matlab large-data