需要一个脚本来从文本文件中去除额外的换行符答案

【问题标题】：Need a script for stripping extra Line Feed characters from text files需要一个脚本来从文本文件中去除额外的换行符
【发布时间】：2017-02-19 02:51:20
【问题描述】：

我在 Windows 中运行 perl，并且我有一些文本文件，其中 CRLF (0d0a) 中的行。问题是，文件周围偶尔会出现一些 0a 字符，这些字符在 Windows perl 中分割行并与我的处理混淆。我的想法是预处理文件，读取由 CRLF 分割的行，但至少在 Windows 中，它也坚持在 LF 上分割。

我试过设置 $/

local $/ = 0x0d; 
open(my $fh, "<", $file) or die "Unable to open $file";
while (my $line = <$fh>) {
    # do something to get rid of the 0x0a embedded in the line of text; 
}

...但是这读了多行...它似乎完全错过了 0x0d。我也尝试将其设置为“\n”、“\n\r”、“\r”和“\r\n”。必须有一个简单的方法来做到这一点！

我需要删除，以便正确处理文件。所以，我需要一个脚本来打开文件，在 CRLF 上拆分文件，找到前面没有 0d 的任何 0a，将其爆破并逐行保存到一个新文件中。

感谢您提供的任何帮助。

【问题讨论】：

这个正则表达式：qr/([\n\x{0B}\f\r\x{85}]{1,2})/; 消除了一些东西吗？也许File::Edit::Portable

标签： perl file

【解决方案1】：

对于初学者，local $/ = 0x0d; 应该是 local $/ = "\x0d";。

除此之外，问题是:crlf 层默认添加到 Windows 中的文件句柄。这会导致CRLF 在读取时转换为LF（在写入时反之亦然）。因此，您阅读的内容中没有CR，因此您最终会阅读整个文件。

只需删除/禁用 :crlf 即可。

local $/ = "\x0D\x0A";
open(my $fh, "<:raw", $file)
    or die("Can't open \"$file\": $!\n");

while (<$fh>) {
    chomp;
    s/\x0A//g;
    say;
}

【讨论】：

【解决方案2】：

此解决方案通过使用二进制模式读取数据来工作。

open(my $INFILE, "<:raw", $infile)
    or die "Can't open \"$infile\": $!\n");
open(my $OUTFILE, ">:raw", $outfile)
    or die "Can't create \"$outfile\": $!\n");

my $buffer = '';
while (sysread($INFILE, $buffer, 4*1024*1024)) {
    $buffer =~ s/(?<!\x0D)\x0A//g;

    # Keep one char in case we cut between a CR and a LF.
    print $OUTFILE substr($buffer, 0, -1, '');
}

print $OUTFILE $buffer;

【讨论】：

（随时恢复。我只是想你会喜欢清理。）