【问题标题】:ignore spaces after particular column no忽略特定列 no 后的空格
【发布时间】:2017-12-14 09:02:17
【问题描述】:

大家好,我有以下数据。

61684 376 23 106 38695633 1 0 0 -1 /C/Program Files (x86)/ 16704 root;TrustedInstaller@NT:SERVICE root;TrustedInstaller@NT:SERVICE 0 1407331175 1407331175 1247541608
8634 416 13 86 574126 1 0 0 -1 /E/KYCImages/ 16832 root;kycfinal@CGKYCAPP03 root;None@CGKYCAPP03 0 1406018846 1406018846 1352415392
60971 472 22 86 38613076 1 0 0 -1 /E/KYCwebsvc binaries/ 16832 root;kycfinal@CGKYCAPP03 root;None@CGKYCAPP03 0 1390829495 1390829495 1353370744
1 416 10 86 1 1 0 0 -1 /E/KycApp/ 16832 root;kycfinal@CGKYCAPP03 root;None@CGKYCAPP03 0 1411465772 1411465772 1351291187

现在我使用下面的代码:

awk 'BEGIN{FPAT = "([^ ]+)|(\"[^\"]+\")"}{print $10}' | awk '$1!~/^\/\./' | sort -u | sed -e 's/\,//g' | perl -p00e 's/\n(?!\Z)/;/g' filename

我得到这个输出

/C/Program;/E/KycApp/;/E/KYCImages/;/E/KycServices/;/E/KYCwebsvc

但是我需要从 $10 开始输出,直到再次遇到“/”,基本上我想忽略第 10 列中的任何空格,直到遇到“/”。 可能吗?

想要的输出是

/C/Program Files (x86)/;/E/KycApp/;/E/KYCImages/;/E/KycServices/;/E/KYCwebsvc binaries/

【问题讨论】:

  • 如果您有grep-o 选项,看起来这就是您想要的...grep -o '/[^.].*/' filename | sort -u | paste -sd';' ...您的示例数据应该包含说明您为什么需要awk '$1!~/^\/\./' 的行或sed -e 's/\,//g'
  • 粘贴 -sd ';'在 AIX 中不起作用,所以我使用 perl -p00e 's/\n(?!\Z)/;/g' 我需要 awk '$1!~/^\/\./' 来忽略任何具有 "/ " 开头,后跟 "." grep -o 在 AIX 中也不起作用。所以需要别的东西。
  • 那么单个命令perl -lne '($p)=/(\/[^.].*\/)/; $h{$p}=1; END{print join ";", keys %h}' filename 怎么样?
  • 你在输出中有/E/KycServices/;。我在您的输入中没有看到它

标签: awk aix text-processing


【解决方案1】:

gawk

awk 'BEGIN{ FPAT="/[^/]+/[^/]+/"; PROCINFO["sorted_in"]="@ind_str_asc"; IGNORECASE = 1 }
     { a[$1] }END{ for(i in a) r=(r!="")? r";"i : i; print r }' filename

输出(没有/E/KycServices/; - 因为它不在您的输入范围内):

/C/Program Files (x86)/;/E/KycApp/;/E/KYCImages/;/E/KYCwebsvc binaries/

【讨论】:

    【解决方案2】:

    尝试在单个 awk 中也跟随。

    awk '{match($0,/\/.*\//);VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH)} END{num=split(VAL, array,"\n");for(i=1;i<=num;i++){printf("%s%s",array[i],i==num?"":";")};print""}'  Input_file
    

    将很快添加非单行形式的解决方案和解释。

    EDIT1:现在也成功添加了非单线形式的解决方案。

    awk '{
            match($0,/\/.*\//);
            VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH)
         }
            END{
                    num=split(VAL, array,"\n");
                    for(i=1;i<=num;i++){
                                            printf("%s%s",array[i],i==num?"":";")
                                       };
                    print""
               }
        '    Input_file
    

    EDIT2:现在也以非单行形式的解决方案添加代码说明。

    awk '{
            match($0,/\/.*\//); ##Using match functionality of awk which will match regex to find the string in a line from / to \, note I am escaping them here too.
            VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH) ##creating a variable named VAL here which will concatenate its own value if more than one occurrence are there. Also RSTART and RSTART are the variables of built-in awk which will be having values once a match has TRUE value which it confirms once a regex match is found in a line.
         }
            END{ ##Starting this block here.
                    num=split(VAL, array,"\n");##creating an variable num whose value will be number of elements in array named array, split is a built-in keyword of awk which will create an array with a defined delimiter, here it is new line.
                    for(i=1;i<=num;i++){ ##Starting a for loop here whose value will go till num value from i variable value 1 to till num.
                                            printf("%s%s",array[i],i==num?"":";") ##printing the array value whose index is variable i and second string it is printing is semi colon, there a condition is there if i value is equal to num then print null else print a semi colon.
                                       };
                    print"" ##print NULL value to print a new line.
               }
        '  Input_file    ###Mentioning the Input_file here.
    

    【讨论】:

      猜你喜欢
      • 2013-09-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-04-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-11-24
      相关资源
      最近更新 更多