在 MATLAB 中使用 textscan 读取固定宽度字符串时出错答案

【问题标题】：Error reading a fixed-width string with textscan in MATLAB在 MATLAB 中使用 textscan 读取固定宽度字符串时出错
【发布时间】：2013-06-19 09:09:28
【问题描述】：

我正在使用 textscan 从文本文件中读取固定宽度（9 个字符）的数据。 Textscan 在包含字符串的某一行失败：

'   9574865.0E+10  '

我想从中读取两个数字：

957486 5.0E+10

问题可以这样复制：

dat = textscan('   9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

返回如下错误：

Error using textscan
Mismatch between file and format string.
Trouble reading floating point number from file (row 1u, field 2u) ==> E+10

令人惊讶的是，如果我们添加减号，我们不会得到错误，而是得到错误的结果：

dat = textscan('  -9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

现在 dat{1} 是：

    -9574865           0

显然，我需要这两种情况才能工作。我目前的解决方法是在字段之间添加逗号并在 textscan 中使用逗号作为分隔符，但这很慢而且不是一个好的解决方案。有什么方法可以使用 textscan 或其他内置（出于性能原因）MATLAB 函数正确读取此字符串？

【问题讨论】：

标签： string matlab textscan

【解决方案1】：

我怀疑textscan first 修剪前导空格，然后 then 解析格式字符串。我认为这是因为如果您从

更改 yuor 格式字符串

'%9f%9f'

到

'%6f%9f'

你的单线突然起作用了。另外，如果你尝试

'%9s%9s'

您会看到第一个字符串的前导空格已删除（因此有 3 个字符“太多”），但由于某种原因，最后一个字符串保留了其尾随空格。

显然，这意味着您必须确切知道这两个数字中有多少位数。我猜这是不可取的。

解决方法可能如下所示：

% Split string on the "dot"
dat = textscan(<your data>,'%9s%9s',...
    'Delimiter'     , '.',...
    'CollectOutput' , true,...
    'ReturnOnError' , false);

% Correct the strings; move the last digit of the first string to the 
% front of the second string, and put the dot back
dat = cellfun(@(x,y) str2double({y(1:end-1),  [y(end) '.' x]}),  dat{1}(:,2), dat{1}(:,1), 'UniformOutput', false);

% Cast to regular array
dat  = cat(1, dat{:})

【讨论】：

是的，它首先修剪，这正是我的问题。 %6f 不是解决方案，我需要将所有前 9 个字符转换为数字。还有其他行使用了所有 9 个字符。
@user1719360：查看我的最新编辑。你可以试试你的数据吗？
这适用于我的字符串，但每个 9 个字符的字段可以包含任何将评估为有效数字的字符串。这是否适用于所有可能的情况？
@user1719360：拆分是用“点”作为分隔符完成的，并且在更正中只将一个字符从一个数组移动到另一个数组。所以它非常具体；它仅适用于第二个数字格式为 [0-9].[0-9]E[+-][0-9]* 的情况

【解决方案2】：

我遇到了类似的问题，并通过两次调用textscan 解决了它，事实证明这比cellfun 或str2double 快得多，并且可以处理任何可由Matlab 的'%f' 解释的输入

在您的情况下，我会首先使用仅字符串参数和 Whitespace = '' 调用 textscan 以正确定义字段的宽度。

data = '   9574865.0E+10  ';
tmp = textscan(data, '%9s %9s', 'Whitespace', '');

现在您需要交织并附加一个不会干扰您的数据的分隔符，例如;

tmp = [char(join([tmp{:}],';',2)) ';'];

现在您可以再次调用 textscan 并使用以下分隔符将正确的格式应用于您的数据：

result = textscan(tmp, '%f %f', 'Delimiter', ';', 'CollectOutput', true);
format shortE
result{:}

ans =

9.5749e+05   5.0000e+10

将这种方法的速度与str2double进行比较：

n = 50000;
data = repmat('   9574865.0E+10  ', n, 1);
% Approach 1 with str2double
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
result1 = str2double([tmp{:}]);
toc

Elapsed time is 2.435376 seconds.

% Approach 2 with double textscan
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
tmp = [char(join([tmp{:}],';',2)) char(59)*ones(n,1)]; % char(59) is just ';'
result2 = cell2mat(textscan(tmp', '%f %f', 'Delimiter', ';', 'CollectOutput', true));
toc

Elapsed time is 0.098833 seconds.

【讨论】：