在 perl 中进行一些格式化后，读取一个 csv 文件并写入另一个 csv 文件答案

【问题标题】：Read one csv file and write to another csv file after doing some formatting in perl在 perl 中进行一些格式化后，读取一个 csv 文件并写入另一个 csv 文件
【发布时间】：2020-04-09 02:39:56
【问题描述】：

我正在尝试在 perl 中操作 csv。

输入 csv 在列数据中有一些换行符，导致其他外部程序失败。我在 Perl 脚本下面写了对 csv 进行预处理以删除这些字符。

use strict; 
use warnings 'all';

# Using Text::CSV file to allow 
# full CSV Reader and Writer 
use Text::CSV; 
use open ":std", ":encoding(UTF-8)";
my $file = $ARGV[0] or die; 

my $csv = Text::CSV->new ( 
{ 
    binary => 1, 
    auto_diag => 1, 
    sep_char => ', '
}); 

my $sum = 0; 

# Reading the file 
open(my $data, '<:encoding(utf8)', $file) or die; 

while (my $words = $csv->getline($data))  
{ 
    tr/\r\n//d for @$words; #removing new lines
    tr/,/;/ for @$words;    #replacing comma with semicolon
    $csv->combine(@$words);
    print $csv->string, "\n";
} 

# Checking for End-of-file 
if (not $csv->eof)  
{ 
    $csv->error_diag(); 
} 
close $data;

我使用下面的 shell 脚本作为包装器将修改后的文件存储在另一个 csv 中。下面的外壳包装器。

perl xyz.pl ${source_csv_file_name} > ${destination_processed_csv_file_name}

我希望我可以在 perl 脚本本身中使用 out csv 处理程序将输出写入另一个文件。我尝试了几种方法，但不断收到一个或其他错误。以下是我尝试过的。

my $outcsv = Text::CSV->new ( { binary => 1, quote_char => "", escape_char => "\\" } );
open(my $data, '<:encoding(utf8)', $file) or die; 
open(my $fh, ">:encoding(utf8)", "new.csv") or die " new.csv: $!";
while (my $words = $csv->getline($data))  
{ 
    tr/\r\n//d for @$words;
    tr/,/;/ for @$words;
    $csv->combine(@$words);
    # Open a handle to the file "new.csv"
    $outcsv->print ($fh, $_) for @words;

    #print $csv->string, "\n";
} 

# Checking for End-of-file 
if (not $csv->eof)  
{ 
    $csv->error_diag(); 
} 
close $data;
close $fh or die "new.csv: $!";

问题是我在上面发布的第一个代码，但是为了编写我使用了 shell 包装器的文件。现在第二个 perl 脚本（我只发布了与第一个不同的代码），当我运行它时失败并出现错误。我了解该错误，但不确定如何修复它“全局符号@words 需要在 xyz.pl 第 29 行显示明确的包名称。由于编译错误，xyz.pl 的执行中止。如果有人可以在这里提供帮助，我将不胜感激。

谢谢

【问题讨论】：

如果您在此处发布示例数据将不胜感激。
另外，告诉我们什么（以及如何）失败
好的，我发布的第一个代码有效，但是为了编写文件，我使用了 shell 包装器。现在第二个 perl 脚本（我只发布了与第一个不同的代码）当我运行它时失败并出现以下错误。我理解错误但不知道如何修复它“全局符号@words requires explicit package name at d2l_preprocess_csv_files.pl line 29. Execution of d2l_preprocess_csv_files.pl aborted due to compilation errors. "
好的，感谢您的回复。这种信息需要首先出现在问题中。（例如：我马上发现你的第二个程序中有一个@words——我想是发帖时的一个错字，你不小心把那个$丢了。但是一旦你显示错误，我们就知道实际上是个问题。）

标签： csv perl

【解决方案1】：

我不确定你在第一个程序中失败的地方，但这里有点精简和清理

use strict; 
use warnings 'all';

use Text::CSV; 
use open ":std", ":encoding(UTF-8)";

my $file = $ARGV[0] or die "Usage: $0 filename\n";

my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 }); 

open my $data,   '<',        $file or die $!; 
open my $fh_out, '>', 'new_'.$file or die $!; 

while (my $words = $csv->getline($data))  
{ 
    tr/\r\n//d for @$words;
    tr/,/;/    for @$words;

    $csv->say($fh_out, $words);
}

这运行正常，并且使用从Shawn's answer 借来的输入文件进行了很好的测试。

你的程序中的combine+string+print 对我来说也适用——但没有理由这样做，因为print 很好地结合了它们（我使用了say，它还附加了一个换行符）。

问题中关于程序的几个cmets

一旦您在程序中使用了open pragma，则在打开文件时不要设置编码。（它应该是:encoding(UTF-8)，而不是utf8。请参阅in Encode docs，以及在此Effective Perler article。）
当您使用die 时打印实际错误，最常见的是$! variable
上面的两个循环显然不如
```
for (@$words) { tr/\r\n//d; tr/,/;/ }
```
我将它们保留为两个循环以指示单独的处理步骤。

上面使用的Text::CSV 中的say 方法在某个时候被添加到模块中，比那个更旧的版本不会有它。然后就可以了

使用print 方法并在构造函数中设置eol 以打印换行符
```
my $csv = Text::CSV->new ( { binary => 1, auto_diag => 1, eol => $/ });
...
$csv->print($fh_out, $words);
```
（还有其他获取换行符的方法，请参阅文档for eol）
或者，不要乱用构造函数，而是手动添加换行符
```
$csv->print($fh_out, $words);
print $fh_out "\n";
```

或者，使用迂回的方式

$csv->combine(@$words);
print $fh_out $csv->string, "\n";

参见文档for print

【讨论】：

谢谢@zdim！我的第一个程序运行良好，但就像我说的它不写入文件而是写入标准输出。我的第二个程序失败了。我在上面的评论中给出了错误信息。我会测试你的方法，让你知道我是怎么做的。
嗨@zdim。当我运行上述程序时，它给了我一个错误。 “在 xyz.pl 第 18 行找不到方法说。”我的 perl 版本是“这是为 x86_64-linux-thread-multi 构建的 perl 5, version 16, subversion 3 (v5.16.3)”
@giri 啊，这意味着您的 Text::CSV 比他们引入 say 方法时更旧（它是在某个时候添加的）。然后改用他们的print
@giri（但我有相同的 Perl，而我的 Text::CSV（版本 1.33）有 say...？）
@giri 在答案末尾添加了选项

【解决方案2】：

Text::AutoCSV 模块（通过您的操作系统包管理器或最喜欢的 CPAN 客户端安装）可以轻松转换 CSV 文件：

#!/usr/bin/env perl
use strict;
use warnings;
use Text::AutoCSV;

Text::AutoCSV->new(in_file => $ARGV[0],
                   out_file => $ARGV[1],
                   encoding => "UTF-8",
                   has_headers => 1, # Set to 0 if no header line
                   read_post_update_hr => \&normalize)->write();

sub normalize {
    my $hr = shift;
    for (values %$hr) {
        s/\r?\n//g;
        tr/,/;/;
    }
}

例子：

$ cat input.csv
id,message
1,"a string, with a comma"
2,"another
with a newline"
3,blah
$ perl demo.pl input.csv new.csv
$ cat new.csv
id,message
1,"a string; with a comma"
2,"another with a newline"
3,blah

【讨论】：

谢谢肖恩。！我会看看我是否已经在服务器中安装了这个模块。如果尚未安装，恐怕我将无法尝试。

【解决方案3】：

这是导致问题的代码：

while (my $words = $csv->getline($data))  
{ 
    tr/\r\n//d for @$words;
    tr/,/;/ for @$words;
    $csv->combine(@$words);
    # Open a handle to the file "new.csv"
    $outcsv->print ($fh, $_) for @words;

    #print $csv->string, "\n";
}

并且，在评论中，你给我们错误：

全局符号 @words 需要在 d2l_preprocess_csv_files.pl 第 29 行显示包名。

我猜第 29 行是：

$outcsv->print ($fh, $_) for @words;

对getline() 的调用为您提供了一个存储在$words 中的数组引用。如果您想将其视为一个数组，则需要取消引用它（@$words - 就像您在一些地方所做的那样）。因此，在有问题的行上，您刚刚忘记了$。您没有名为@words 的数组，您需要使用$@words。

【讨论】：

嗨，戴夫，感谢您的回复。尝试了您的建议，但仍然收到错误“预期字段是 xyz.pl 第 29 行，第 1 行的数组引用。”
@giri：好吧，我的修复将停止您报告的初始错误。如果不查看您当前的代码是什么样子，很难提供更多帮助。