将大型文本文件读入 MATLAB答案

【问题标题】：Reading large text files into MATLAB将大型文本文件读入 MATLAB
【发布时间】：2015-07-07 17:50:50
【问题描述】：

我正在尝试编写一个函数来将多个 (1000+) 文本文件 ('.txt') 读取到 MATLAB 中。一个文件的片段如下所示。实际文件具有相同的列，但有大约 150 000 行。

Start, Serial, DeviceId, RunNumber, Date, Real, Elapsed, X, EcgVal, EcgStatus, CapnoVal, CapnoStatus, P1Val, P1Status, P2Val, P2Status, P3Val, P3Status, Spo2Val, Spo2Status, CprDepth, CprFrequency, CprStatus, CprWaveVal, FiltEcgVal, FiltEcgStatus, Ecg2Val, Ecg2Status, Ecg3Val, Ecg3Status, Ecg4Val, Ecg4Status
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.000, 00:00:00.000, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.008, 00:00:00.008, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.016, 00:00:00.016, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.024, 00:00:00.024, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1

我已经尝试了明显的方法（csvread、dlmread、importdata）但没有成功。当我使用“ImportData”功能打开此文件时，我得到：

þS

后跟 5 个空行。使用

fid = fopen('TEST.txt','r');
fgetl(fid)

我发现每个数据行之间有一个空行，每个字符之间有一个空格。

我也尝试过如下使用 textscan 函数

fid = fopen('TEST.txt','r');
c = textscan(fid, '%s', 'Delimiter', ',')

但这会返回一个空单元格。

另一种可行的方法是在 Excel 中打开文件并将其另存为 CSV 文件。但是，鉴于我尝试对 1000 多个文件执行此操作，这是不可行的。

非常感谢任何 cmets、建议或建议。谢谢！

更新：

以下似乎有效：

data = textscanu('TEST.txt');
str=textscan(data{1},'%s','Delimiter',',')

我将尝试将其写下来以读取整个文件，跳过空白行并组织所有列。

【问题讨论】：

并将从第二行开始的所有 32 列存储到每个文本文件的 N x 32 大小的单元格数组中？
是的，没关系。我对格式不是特别挑剔 - 一旦导入数据，我可以重新格式化/组织数据。
如何保存您的 txt 文件？是否有可能将它们编码为每个字符 16 位而不是每个字符 8 位？也许是一些 unicode？span>
文本文件是从机器（实际上是除颤器）下载的，因此我无法控制它们的保存方式。它们完全有可能是 unicode 或有一些奇怪的编码。
所以，每行似乎有 33 个条目，请确保这一点。

标签： matlab import text-files

【解决方案1】：

方法#1：使用importdata -

%// Import text data as string cells, assuming file1 is the path to text file
data = importdata(file1,'')

%// Split columns based on the delimiter: ' '
split_data = cellfun(@(x) strsplit(x,' ') , data(2:end),'Uni',0)

%// Gather data into a N x number_of_entries cell array
out_data = vertcat(split_data{:})

%// Remove the commas after each entry (if so desired)
out_data = cellfun(@(x) strrep(x,',','') , out_data,'Uni',0)

%// Remove the sixth columns that had extra commas
out_data(:,6) = []

方法#2：使用textscan -

%// Read entire text data into a cell of a cell array, 
%// assuming file1 is the path to text file
fileID = fopen(file1,'r');
onecell_data = textscan(fileID,'%s','Delimiter','\n','HeaderLines',1);
fclose(fileID);

%// Unpack one level of data to have N x 1 sized cell array
data = [onecell_data{:}]

%// Split columns based on the delimiter: ' '
split_data = cellfun(@(x) strsplit(x,' ') , data(2:end),'Uni',0)

%// Gather data into a N x number_of_entries cell array
out_data = vertcat(split_data{:})

%// Remove the commas after each entry (if so desired)
out_data = cellfun(@(x) strrep(x,',','') , out_data,'Uni',0)

%// Remove the sixth columns that had extra commas
out_data(:,6) = []

【讨论】：

@DrDunkenstein 您的 MATLAB 版本和操作系统是什么？你可以试试这个：data = importdata(file1,'%s')？
我有 MATLAB 2012b 和 Windows 7 Ultimate。只是去开会，但之后会尝试。另外，请参阅上面的更新！
@DrDunkenstein 查看刚刚添加的新方法。
抱歉耽搁了。由于某种原因，这也不起作用，“onecell_data”是一个空单元格数组。
@DrDunkenstein 可能是文件特定问题。如果您可以分享，请上传到某处并在此处链接。否则很抱歉。