如何使用 AWK 重新构造文件？答案

【问题标题】：How to re-structure the file using AWK?如何使用 AWK 重新构造文件？
【发布时间】：2014-09-15 10:03:24
【问题描述】：

我已经编写了一个代码来根据控制文件重新构造csv文件，控制文件如下所示。

Control file : 

1,column1
3,column3
6,column6
4,column4
-1,column9

基于上述控制文件，我在 source.csv 文件中获取索引的 1、3、6、4、-1 列，并使用粘贴命令创建新文件。如果控制文件 i 中的索引值为 -1必须将整列插入为空，标题名称将是 column9。

代码：

var=1
while read line
do
    t=$(echo $line | awk '{ print $1}' | cut -d, -f1)
    if [ $t != -1 ]
    then
        cut -d, -f$t source.csv >file_$var.csv
    else
        touch file_$var.csv
    fi
    var=$((var+1))
done < "$file"
ls -v file_*.csv | xargs paste -d, > new_file.csv

有没有办法将这些行转换为 AWK ，建议我一些想法。

运行脚本之前：

sample.csv

column1,column2,column3,column4,column5,column6,column7
a,b,c,d,e,f,g

输出：

new_file.csv

column1,column3,column6,column4,column9
a,c,f,d,

column9为-1表示null或just，分隔表示null。

基本意图是在控制文件的基础上重构源文件。

脚本：

#Greenplum Database details to read target file structure from Meta Data Tables.

export PGUSER=xxx
export PGPORT=5432
export PGHOST=10.100.20.10
export PGDATABASE=fff

SCHEMA='jiodba'

##Function to explain usage of this script
usage() {
echo "Usage: program.sh -s <Source_folder> -t <Target_folder> -f <file_name> ";
exit 1; }

source_folder=$1
target_folder=$2
file_name=$3


#removes the existing file from current directory

rm -f file_struct_*.csv

# Reading the Header from the Source file.

v_source_header=`head -1 $file_name`

IFS=","     # Set the field separator
set $v_source_header      # Breaks the string into $1, $2, ...
i=1
for item    # A for loop by default loop through $1, $2, ...
do
    echo "$i,$item">>source_header.txt
    ((i++))
done

sed -e "s/
//" source_header.txt | sed -e "s/ \{1,\}$//" > source_headers.txt

rm -f source_header.txt

#Get the Target header information from Greenplum Meta data Table and writing into target_header.txt file.

psql -t -A -F "," -c "select Target_column_position,Target_column_name from jiodba.etl_tbl_sequencing where source_file_name='$file_name' order by target_column_position" > target_header.txt

#Removing the trail space and control characters.

sed -e "s/
//" target_header.txt | sed -e "s/ \{1,\}$//" > target_headers.txt

rm -f target_header.txt

#Compare the Source Header Target Structure and generate the Difference.

awk -F, 'NR==FNR{a[$2]=$1;next} {if ($2 in a) print a[$2]","$2; else print "-1," $2}' source_headers.txt  target_headers.txt >>tgt_struct_output.txt

#Loop to Read column index from the tgt_struct_output.txt and cut it in Source file.


file='tgt_struct_output.txt'
var=1
while read line
do
t=$(echo $line | awk '{ print $1}' | cut -d, -f1)
if [ $t != -1 ]
then
cut -d, -f$t $file_name>file_struct_$var.csv
else
touch file_struct_$var.csv
fi
var=$((var+1))
done<"$file"


awk -F, -v OFS=, 'FNR==NR {c[++n]=$2; a[$2]=$1;next} FNR==1{f=""; for (i=1; i<=n; i++)
  {printf "%s%s", f, c[i]; b[++k]=i; f=OFS} print "";next}
  {for (i=1; i<=n; i++) if(a[c[i]]>0) printf "%s%s", $a[c[i]], OFS; print""
   }' tgt_struct_output.txt $file_name


#Paste the different file(columns)into single file

ls -v file_struct_*.csv | xargs paste -d,| sed -e "s/
//" > new_file.csv

new_header=`cut -d "," -f 2 target_headers.txt | tr "\n" "," | sed 's/,$//'`

#Replace the header with original target header incase if column doesnt exit in the target table structure.

sed "1s/.*/$new_header/" new_file.csv

#Removing the Temp files.

rm -f file_struct_*.csv
rm -f source_headers.txt target_headers.txt tgt_struct_output.txt
touch file_struct_1.csv #Just to avoid the error in shell

Sample.csv

BP ID,Prepaid Account No,CurrentMonetary balance ,charge Plan names ,Provider contract id,Contract Item ID,Start Date,End Date
1100001538,001000002506,251,[B2] R2 LTE CHARGE PLAN ,00000000000000000141,[B2] R2 LTE CHARGE PLAN _00155D10E20D1ED39A8E146EA7169A2E00155D10E20D1ED398FD63624498DB4A,16-Oct-12,18-Oct-12
1100003404,001000004029,45.22,B0.3 ECS_CHARGE_PLAN DROP1 V3,00000000000000009349,B0.3 ECS  DROP2 V0.2_00155D10E20D1ED39A8E146EA7169A2E00155D10E20D1ED398FD63624498DA2E,16-Nov-13,23-Nov-13
1100006545,001000006620,388.796,B0.3 ECS_CHARGE_PLAN DROP1 V3,00000000000000010477,B0.3 ECS  DROP2 V0.2_00155D10E20D1ED39A8E146EA7169A2E00155S00E20D1ED398FD63624498DA2E,07-Nov-12,07-Nov-13

【问题讨论】：

你能发布所需输出的格式吗？
@Paul 请找到修改后的问题

标签： linux shell awk

【解决方案1】：

你可以试试这个 awk：

awk -F, -v OFS=, 'FNR==NR {c[++n]=$2; a[$2]=$1;next} FNR==1{f=""; for (i=1; i<=n; i++) 
  {printf "%s%s", f, c[i]; b[++k]=i; f=OFS} print "";next}
  {for (i=1; i<=n; i++) if(a[c[i]]>0) printf "%s%s", $a[c[i]], OFS; print""
   }' ctrl.csv sample.csv
column1,column3,column6,column4,column9
a,c,f,d,

【讨论】：

它的工作方式非常好，但我需要基于 ctrl 文件的 sample.csv 文件中的实际列值，而不是打印 ctrl.csv 文件中的相同索引。
部分工作，请查找脚本使用粘贴命令。同样的while循环需要修改为awk。
我没有得到。这个 awk 脚本产生的输出是否与您预期的完全相同？
是的，它的部分工作.. 所以我附上了我的脚本，它将使用粘贴命令做同样的事情。
还是没有。你要求一个 awk 脚本，这就是我的回答。现在，如果这个 awk 不知何故不起作用，那么您需要清楚地说明它对于哪个输入不起作用以及您的预期输出是什么。