用于拆分第 n 个字段文本并作为新行插入的 awk 命令答案

【问题标题】：Awk command to split nth field text and insert as new rows用于拆分第 n 个字段文本并作为新行插入的 awk 命令
【发布时间】：2021-12-12 09:33:47
【问题描述】：

这是我之前的问题的延续，只是检查我是否能够同时处理这个模型

Reduce processing time for 'While read' loop

我有一个巨大的 csv 文件，有一个不同长度的字段 11，比如

"xx","x",x,x,x,xx,xx,"x",x,11,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD00000ddfgD00000dsdfD00000snfjD00000djffD00000wedfD00000asdfZ"  
"xx","x",x,x,x,xx,xx,"x",x,5,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD"

将字段 11 拆分为 10 的大小后，我需要 6-9 个字符。然后我必须将它作为新行插入我需要如下输出，

"xx","x",x,x,x,xx,xx,"x",x,11,"aaaa"  
"xx","x",x,x,x,xx,xx,"x",x,11,"bbbb"  
"xx","x",x,x,x,xx,xx,"x",x,11,"abcd"
.  
.  
.  
"xx","x",x,x,x,xx,xx,"x",x,11,"asdf"  
"xx","x",x,x,x,xx,xx,"x",x,5,"djff"  
.  
.  
"xx","x",x,x,x,xx,xx,"x",x,5,"deds"

while read -r line1; do
    icount=$[icount+1]
    col_11=$( echo $line1 | cut -d',' -f11 )
    col_10=$( echo $line1 | cut -d',' -f1,2,3,4,5,7,10)
    #echo $col_11
    col_11_trim=$(echo "$col_11" | tr -d '"')
    #echo $col_11_trim
    echo $col_11_trim | fold -w10 > $path/col_11_extract 
    while read -r line2; do
        ocount=$[ocount+1]
        strng_cut=$(echo $line2 | cut -c6-9) 
                echo "$col_10",\""$strng_cut"\"  >> $path/final_out     
    done < $path/col_11_extract 
done < $input

【问题讨论】：

标签： bash shell awk while-loop

【解决方案1】：

与awk:

awk 'BEGIN{FS=OFS=","}
     {
       eleven=$11;
       len=length(eleven);
       for(i=2; i<len-1; i=i+10){
         $11="\"" substr(eleven, i+5, 4) "\"";
         print;
       }
     }' file

for 循环从位置 2 开始并以 len-1 结束，因为字段 11 中有引号。

输出：

"xx","x",x,x,x,xx,xx,"x",x,11,"aaaa" "xx","x",x,x,x,xx,xx,"x",x,11,"bbbb" "xx","x",x,x,x,xx,xx,"x",x,11,"abcd" "xx","x",x,x,x,xx,xx,"x",x,11,"dwas" "xx","x",x,x,x,xx,xx,"x",x,11,"deds" "xx","x",x,x,x,xx,xx,"x",x,11,"ddfg" "xx","x",x,x,x,xx,xx,"x",x,11,"dsdf" "xx","x",x,x,x,xx,xx,"x",x,11,"snfj" "xx","x",x,x,x,xx,xx,"x",x,11,"djff" "xx","x",x,x,x,xx,xx,"x",x,11,"wedf" "xx","x",x,x,x,xx,xx,"x",x,11,"asdf" "xx","x",x,x,x,xx,xx,"x",x,5,"aaaa" "xx","x",x,x,x,xx,xx,"x",x,5,"bbbb" "xx","x",x,x,x,xx,xx,"x",x,5,"abcd" "xx","x",x,x,x,xx,xx,"x",x,5,"dwas" "xx","x",x,x,x,xx,xx,"x",x,5,"deds"

【讨论】：

酷，工作。 10 条输入记录（1400 条输出记录）的处理时间从 9 秒减少到 0.025 秒。
您能解释一下子字符串中的 i+5 以及如何将字符串添加为新行吗？
i+5 跳过每个子字符串中不需要的前 5 个字符。 print 输出带有新第 11 个字段的行。