【发布时间】:2019-07-10 16:44:51
【问题描述】:
我有一些明文表格,需要以 csv 格式输出 如果我执行 tr 并替换字符,当我有 2 行时,我的字段会出现一些问题。
cat file.txt | tr -s '|' ' ' | tr -s '_' ' '
原表:
____________________________________________________________________________
| Name | AB | DATA | SOME | IF | DATE |
|___________________________|_________|__________|_______|________|__(UTC)__|
| Marra Carolina Odoriz | | | | |2019-07- |
| Dolman |36737202 |098787267 | 45 | - |09T10:35:|
|____________________________|_________|__________|_______|________|_50.289Z_|
| | | | | |2019-07- |
| - |53959997 |098543650 | 30 | - |09T12:02:|
|____________________________|_________|__________|_______|________|_36.746Z_|
| | | | | |2019-07- |
| Vic Velazquez |33577915 |096638025 | - | 6000 |09T12:40:|
|____________________________|_________|__________|_______|________|_17.754Z_|
| Gabriela Letacia Cararallo | | | | |2019-07- |
| Vacchetzi |43132876 |091322398 | 30 | - |09T12:40:|
|____________________________|_________|__________|_______|________|_40.887Z_|
我需要 csv 的输出 对于这个简单的表格示例:
NAME;AB;DATA;SOME;IF;DATE (UTC)
Marra Carolina Odoriz Dolman;36737202;098787267;45;-;2019-07-09T10:35:50.289Z
-;53959997;098543650;30;-;2019-07-09T12:02:36.746Z
Vic Velazquez;33577915;096638025;-;6000;2019-07-09T12:40:17.754Z
Gabriela Letacia Cararallo Vacchetzi;43132876;091322398;30;-;2019-0709T12:40:40.887Z
如果我有没有“table ascii”设计的原始多行输入文件,可以将此部分解决方案应用到案例中吗? 我已经申请了:
while(<>)
{
@vals = split /\ /; # split fields into the val array (now I take the blank space)
$size = @vals;
for( $i = 0 ; $i < $size ; $i++ )
{
#clean up the values: remove underscores and extra spaces
#remove semicolons
$vals[$i] =~ s/_/ /g;
$vals[$i] =~ s/;/ /g;
$vals[$i] =~ s/^ *//;
$vals[$i] =~ s/ *$//;
# append the value to the data record for this field
$data[$i] .= $vals[$i];
# special handling for first field: use spaces when joining
$data[$i] .= " " if ($i==0);
}
if(/\R/) # Taking four underscores to indicate the end of the record
# now taking the return of carriage of new line how end of the record
{
# clean up the first record; trim spaces
$data[0] =~ s/^ *//;
$data[0] =~ s/ *$//;
$data[3] =~ s/\..*//;
# join the records with semicolons
$line = join (";", @data);
# collapse multiple spaces
$line =~ s/ +/ /g;
# print this line and start over
print "$line\n" unless ($line eq '');
@data = ();
}
}
有了这个解决方案,结果是:
姓名;完整;;;;;;;;;AB;;;;;;;数据;;;某些;;日期;(UTC) 马拉;卡罗来纳州;奥多里兹;;;;;36737202;098787267;45;-;2019-07-09T10:35:50.289Z
多尔曼 ;;;
【问题讨论】:
-
对于 *nix 基于行的文本工具来说,这看起来像是一场噩梦。也许
perl模块可以解决它,但您需要一位顾问。我会花时间试图说服原始表的提供者让您访问他们的数据源,或者提供您需要的输出。祝你好运。