将一个格式奇特的大型文本数据文件读入 MATLAB答案

【问题标题】：Reading a large, oddly formatted text file of data into MATLAB将一个格式奇特的大型文本数据文件读入 MATLAB
【发布时间】：2014-11-10 07:23:45
【问题描述】：

需要读取一个巨大的文本文件，其中包含格式奇怪的数据。格式如下：

//Header with Title Info

//Header with Test1 Info
//More Test1 Info
0,-156.875956035285
1.953125,-4.82866496038806
3.90625,-8.93502887648155
5.859375,-9.76964479822559
7.8125,-14.9767168331976
9.765625,-16.9949034672061
11.71875,-19.2709033739316
13.671875,-18.9948581866681

//Header with Test2 Info
//More Test2 Info
0,-156.875956035285
1.953125,-4.82866496038806
3.90625,-8.93502887648155
5.859375,-9.76964479822559
7.8125,-14.9767168331976
9.765625,-16.9949034672061
11.71875,-19.2709033739316
13.671875,-18.9948581866681

//Header with Test3 Info
//More Test3 Info
0,-156.875956035285
1.953125,-4.82866496038806
3.90625,-8.93502887648155
5.859375,-9.76964479822559
7.8125,-14.9767168331976
9.765625,-16.9949034672061
11.71875,-19.2709033739316
13.671875,-18.9948581866681

// End of Data

这就是它的要点，除了每个标题下大约有 25,000 个条目而不是 8 个。我正在运行 25 个测试，这些测试需要平均到一组数据中。

本质上，我想解析这个序列中的数据：

跳过第一行
识别空行，转到下一步
检查“数据结束”
如果不是结尾，则跳过当前行和下一行
为当前的测试数据集创建新数组
读取数据直到到达空行，然后返回步骤 2

然后，我想以最有效的方式对所有这些集合进行平均。

我无法读取数据。我知道我可以使用 csvread 或更通用的函数来读取分隔值，但我有点坚持想出一种优雅而简洁的方式来做所有事情。

我是从这个开始的：

function [ data ] = graph( input_args )
%Plot data

myData = fopen('mRoom_fSweep_25points_center.txt');
data = textscan(myData,'%s');
fclose(myData);
length(data)
end

我想我可以找到这个字符串数组的长度，并为整个操作列表制定一个 for 循环，但我无法超越这一点：输出一直给我这个：

ans = 
    {772321x1 cell}

我不能使用。当我尝试将其存储在一个变量中时，它的值为 1。我在这里缺少的单元格数组有什么奇怪的地方吗？

【问题讨论】：

不确定，但我认为在每个单元格元素中您保存了一行。您是否尝试使用 data{line,1} 处理这些条目？您必须使用大括号 {} 来获取带有“普通”括号的单元格内的数据，您只能获取整个单元格。

标签： matlab text-files cell-array

【解决方案1】：

我假设您需要“测试信息”行中的信息？

如果是这样，您需要使用两种不同的模式运行textscan：一种用于挑选信息行，另一种用于读取数据：

 info(1, end+1) = textscan(fid, '//%s','Delimiter', '');
 data(1, end+1) = textscan(fid, '%f, %f', 'CollectOutput', true);

下面是我如何用循环和错误处理来包装它：

% [info, data] = read_data(file_name): Read a file in funky format
% 
% info and data are cells of same size
function [info, data] = read_data(file_name)
    [fid, msg] = fopen(file_name);
    if fid<0
        error('Unable to open file "%s": %s', file_name, msg);
    end
    % close the file no matter how we exit this funciton (error,
    % ctrl-c,...)
    finalize = onCleanup(@() fclose(fid));

    info = cell(1,0);
    data = cell(1,0);
    while true
        info(1, end+1) = textscan(fid, '//%s','Delimiter', '');
        data(1, end+1) = textscan(fid, '%f, %f', 'CollectOutput', true);

        if strcmpi(info{1,end}{end}, 'End of Data')
            % End of data reached, exit here
            info = info(1:end-1);
            data = data(1:end-1);
            break;
        end
        if isempty(data{1,end})
            % Empty data, but not 'End of data' marker.
            % Replace this error with break to accept files with missing
            % "end of data" tags
            error('Empty data before "End of Data" line')
        end
    end
end

然后，您可以按如下方式读取文件并计算平均值：

>> [info, data] = read_data('foo.txt')
info = 
    {3x1 cell}    {2x1 cell}    {2x1 cell}
data = 
    [8x2 double]    [8x2 double]    [8x2 double]


>> info{3}
ans = 
    'Header with Test3 Info'
    'More Test3 Info'

>> all_data = cellfun(@(d) d(:,2), data, 'UniformOutput', false); all_data = [all_data{:}]
all_data =
 -156.8760 -156.8760 -156.8760
   -4.8287   -4.8287   -4.8287
   -8.9350   -8.9350   -8.9350
   -9.7696   -9.7696   -9.7696
  -14.9767  -14.9767  -14.9767
  -16.9949  -16.9949  -16.9949
  -19.2709  -19.2709  -19.2709
  -18.9949  -18.9949  -18.9949

>> mean(all_data, 2)
ans =
 -156.8760
   -4.8287
   -8.9350
   -9.7696
  -14.9767
  -16.9949
  -19.2709
  -18.9949

【讨论】：

这太不可思议了，我无法想象得到更彻底的回应，谢谢！
+1 是一个很好的答案，但专门用于使用 onCleanup 无论如何关闭文件。尝试导入格式松散的文件时，这是一个很好的做法。