【问题标题】:Awk command to split nth field text and insert as new rows用于拆分第 n 个字段文本并作为新行插入的 awk 命令
【发布时间】:2021-12-12 09:33:47
【问题描述】:

这是我之前的问题的延续,只是检查我是否能够同时处理这个模型

Reduce processing time for 'While read' loop

我有一个巨大的 csv 文件,有一个不同长度的字段 11,比如

"xx","x",x,x,x,xx,xx,"x",x,11,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD00000ddfgD00000dsdfD00000snfjD00000djffD00000wedfD00000asdfZ"  
"xx","x",x,x,x,xx,xx,"x",x,5,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD"  

将字段 11 拆分为 10 的大小后,我需要 6-9 个字符。然后我必须将它作为新行插入 我需要如下输出,

"xx","x",x,x,x,xx,xx,"x",x,11,"aaaa"  
"xx","x",x,x,x,xx,xx,"x",x,11,"bbbb"  
"xx","x",x,x,x,xx,xx,"x",x,11,"abcd"
.  
.  
.  
"xx","x",x,x,x,xx,xx,"x",x,11,"asdf"  
"xx","x",x,x,x,xx,xx,"x",x,5,"djff"  
.  
.  
"xx","x",x,x,x,xx,xx,"x",x,5,"deds"
while read -r line1; do
    icount=$[icount+1]
    col_11=$( echo $line1 | cut -d',' -f11 )
    col_10=$( echo $line1 | cut -d',' -f1,2,3,4,5,7,10)
    #echo $col_11
    col_11_trim=$(echo "$col_11" | tr -d '"')
    #echo $col_11_trim
    echo $col_11_trim | fold -w10 > $path/col_11_extract 
    while read -r line2; do
        ocount=$[ocount+1]
        strng_cut=$(echo $line2 | cut -c6-9) 
                echo "$col_10",\""$strng_cut"\"  >> $path/final_out     
    done < $path/col_11_extract 
done < $input

【问题讨论】:

    标签: bash shell awk while-loop


    【解决方案1】:

    awk:

    awk 'BEGIN{FS=OFS=","}
         {
           eleven=$11;
           len=length(eleven);
           for(i=2; i<len-1; i=i+10){
             $11="\"" substr(eleven, i+5, 4) "\"";
             print;
           }
         }' file
    

    for 循环从位置 2 开始并以 len-1 结束,因为字段 11 中有引号。

    输出:

    "xx","x",x,x,x,xx,xx,"x",x,11,"aaaa" "xx","x",x,x,x,xx,xx,"x",x,11,"bbbb" "xx","x",x,x,x,xx,xx,"x",x,11,"abcd" "xx","x",x,x,x,xx,xx,"x",x,11,"dwas" "xx","x",x,x,x,xx,xx,"x",x,11,"deds" "xx","x",x,x,x,xx,xx,"x",x,11,"ddfg" "xx","x",x,x,x,xx,xx,"x",x,11,"dsdf" "xx","x",x,x,x,xx,xx,"x",x,11,"snfj" "xx","x",x,x,x,xx,xx,"x",x,11,"djff" "xx","x",x,x,x,xx,xx,"x",x,11,"wedf" "xx","x",x,x,x,xx,xx,"x",x,11,"asdf" "xx","x",x,x,x,xx,xx,"x",x,5,"aaaa" "xx","x",x,x,x,xx,xx,"x",x,5,"bbbb" "xx","x",x,x,x,xx,xx,"x",x,5,"abcd" "xx","x",x,x,x,xx,xx,"x",x,5,"dwas" "xx","x",x,x,x,xx,xx,"x",x,5,"deds"

    【讨论】:

    • 酷,工作。 10 条输入记录(1400 条输出记录)的处理时间从 9 秒减少到 0.025 秒。
    • 您能解释一下子字符串中的 i+5 以及如何将字符串添加为新行吗?
    • i+5 跳过每个子字符串中不需要的前 5 个字符。 print 输出带有新第 11 个字段的行。
    猜你喜欢
    • 1970-01-01
    • 2018-06-01
    • 1970-01-01
    • 1970-01-01
    • 2017-04-15
    • 2014-04-07
    • 2018-08-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多