用于在特定列的管道分隔文件中查找和删除不需要的字符串的 unix shell 脚本答案

【问题标题】：unix shell scripting to find and remove unwanted string in a pipe delimited file in a particular column用于在特定列的管道分隔文件中查找和删除不需要的字符串的 unix shell 脚本
【发布时间】：2018-10-20 18:19:14
【问题描述】：

{ 我有一个要求，文件是管道“|”划定的。第一行包含标题，列数为 5。

如果匹配模式，我只需要删除第 3 列中的字符串。

还请注意，第 3 列可以包含带有逗号 ,、分号 ; 或冒号 : 的字符串，但它永远不会包含管道 |（因此我们选择了管道分隔符）。

输入文件：

COL1|COL2|COL3|COL4|COL5
1|CRIC|IPL|CRIC1:IPL_M1;IPL_M2;TEST_M1,CRIC2:ODI_M1;IPL_M3|C1|D1
2|CRIC|TEST|CRIC1:TEST_M2,CRIC2:ODI_M1;IPL_M1;TEST_M2;IPL_M3;T20_M1|C2|D2

输出应仅在 COL3 中更改，不应更改其他列，即在 COL3 中应存在与模式“IPL_”匹配的字符串。任何其他字符串，如 "TEST_M1"、"ODI_M1" 都应设为 null。并且应该删除任何不需要的分号。

例如

Question - CRIC1:IPL_M1;IPL_M2;TEST_M1,CRIC2:ODI_M1;IPL_M3
result   - CRIC1:IPL_M1;IPL_M2,CRIC2:IPL_M3

另一种情况，如果只存在不匹配“IPL_”的字符串，那么

Question -  CRIC1:TEST_M1,CRIC2:ODI_M1
Result   -  CRIC1:,CRIC2:

输出文件：

COL1|COL2|COL3|COL4|COL5
1|CRIC|IPL|CRIC1:IPL_M1;IPL_M2,CRIC2:IPL_M3|C1|D1
2|CRIC|TEST|CRIC1:,CRIC2:IPL_M1;IPL_M3|C2|D2

基本要求是查找和替换字符串，

输入

COL1|COL2|COL3|COL4|COL5
1|A1|A12|A13|A14|A15

将第 3 列中的 A13 替换为 B13（A13 可以更改，我的意思是我们必须找到像 A13 这样的任何模式）

输出

COL1|COL2|COL3|COL4|COL5
1|A1|A12|B13|A14|A15

提前致谢。

用更简单的术语重新格式化场景，只取 2 列，我需要在其中搜索“IPL_”并只保留这些字符串，并且应该删除任何其他字符串，如“ODI_M3;TEST_M5”

{

I/P：

{

COL1|COL2

CRIC1|IPL_M1;IPL_M2;TEST_M1

CRIC2|ODI_M1;IPL_M3

CRIC3|ODI_M3;TEST_M5

CRIC4|IPL_M5;ODI_M5;IPL_M6

}

O/P：

{

COL1|COL2

CRIC1|IPL_M1;IPL_M2

CRIC2|IPL_M3

CRIC3|

CRIC4|IPL_M5;IPL_M6

}

等待您的宝贵建议。请帮助我是这个平台的新手。

谢谢，萨奎布 }

【问题讨论】：

不清楚，请使用按钮{}将您的代码包装在代码标签中，并在您的帖子中明确提及条件。
输入 COL1|COL2|COL3|COL4|COL5 1|A1|A12|A13|A14|A15 将 A13 替换为 B13 输出 COL1|COL2|COL3|COL4|COL5 1|A1|A12|B13 |A14|A15
嗨@RavinderSingh13，我添加了一个简单的场景。你现在能帮忙吗？提前致谢。
不，它们仍然不在代码标签中。简单地选择你所有的样本然后点击 BAR 中的{} 按钮，它们就会出现在代码标签中，这将使我们的生活更容易理解。

标签： unix awk sed split grep

【解决方案1】：

如果我没看错（而且我不完全确定我是否正确；我主要通过提供的示例进行说明），那么使用 Perl 可以相对明智地完成：

#!/usr/bin/perl

while(<>) {
    if($. > 1) {
        local @F = split /\|/;

        $F[3] = join(",", map {
            local @H = split /:/;
            $H[1] = join(";", grep(/IPL_/, split(";", $H[1])));
            join ":", @H;
        } split(/,/, $F[3]));

        $_ = join "|", @F;
    }

    print;
}

将此代码放入文件中，例如foo.pl，然后如果您的数据在文件data.txt 中，则可以运行

perl -f foo.pl data.txt

它的工作原理如下：

#!/usr/bin/perl

# Read lines from input (in our case: data.txt)
while(<>) {
    # In all except the first line (the header line):
    if($. > 1) {
        # Apply the transformation. To do this, first split the line into fields
        local @F = split /\|/;

        # Then edit the third field. This has to be read right-to-left at the top
        # level, which is to say: first the field is split along commas, then the
        # tokens are mapped according to the code in the inner block, then they
        # are joined with commas between them again.
        $F[3] = join(",", map {
            # the map block does a similar thing. The inner tokens (e.g.,
            # "CRIC1:IPL_M1;IPL_M2") are split at the colon into the CRIC# part
            # (which is to be unchanged) and the value list we want to edit.
            local @H = split /:/;

            # This value list is again split along semicolons, filtered so that
            # only those elements that match /IPL_/ remain, and then joined with
            # semicolons again.
            $H[1] = join(";", grep(/IPL_/, split(";", $H[1])));

            # The map result is the CRIC# part joined to the edited list with a colon.
            join ":", @H;
        } split(/,/, $F[3]));

        # When all is done, rejoin the outermost fields with pipe characters
        $_ = join "|", @F;
    }

    # and print the result.
    print;
}

【讨论】：

非常感谢。但是我们需要使用 AWK 或 SED