awk 搜索并附加来自其他 csv 文件的匹配名称答案

【问题标题】：Awk search and append matching name from other csv fileawk 搜索并附加来自其他 csv 文件的匹配名称
【发布时间】：2015-08-24 18:28:54
【问题描述】：

我有 2 个 csv 文件

文件 1 包含

product_id, category_id, price
pid01,cat01,10
pid02,cat01,10
pid03,cat01,20
pid04,cat02,30
pid05,cat02,20
pid06,cat03,30

文件 2 包含

category_id, category_name
cat01,Mouse
cat02,Cat
cat03,Fish
cat04,Dog

我需要这样的结果

product_id, category_id, category_name, price
pid01,cat01,Mouse,10
pid02,cat01,Mouse,10
pid03,cat01,Mouse,20
pid04,cat02,Cat,30
pid05,cat02,Cat,20
pid06,cat03,Fish,30

或

product_id, category_name, price
pid01,Mouse,10
pid02,Mouse,10
pid03,Mouse,20
pid04,Cat,30
pid05,Cat,20
pid06,Fish,30

如何在 Bash 或 Awk 中实现它？

【问题讨论】：

file2 的第一行是否包含标题
是的，让我更新一下问题

标签： bash awk

【解决方案1】：

这个 awk 可以做到：

awk -F, 'NR==FNR{a[$1]=$2;next}FNR>1{print $1,$2,a[$2],$3}' OFS=, file2 file1

顺便说一句，您还需要添加标题。让我以多行格式解释脚本：

# Specify the field delimiter and print the headers
BEGIN {
    FS=OFS=","
    $1="product_id"
    $2="category_id"
    $3="category_name"
    $4="price"
    print
}

# As long as the total number of records (NR) equals
# number of records is equal to the number of records
# in the current input file (FNR) we populate data
# from file2 to the lookup table 'a'
NR==FNR{
    a[$1]=$2
    next # Skip the following block and go on parsing file2
}

# Skip line 1 in file1, inject column 3 with the value from
# the lookup table and output the record
FNR>1{
    print $1,$2,a[$2],$3
}

请同时查看anubhava's comment。在gawk 或mawk 中，使用-F', *' 可以更简单地打印标题。逗号后面的可选空格是因为您的列标题中有一个空格。我会在处理之前简单地删除该空间。

【讨论】：

awk -F ', *' -v OFS=, 'FNR==NR{a[$1]=$2; next} {print $1, $2, a[$2], $3}' file2 file1 也会得到标题行。
@anubhava 好收获！ :) 我已经想知道为什么它一开始就不起作用，但想完成我的解释。错过了标题中的空间！谢谢！
狂热！谢谢 hek2mgl

【解决方案2】：

加入：

join --header -t , -1 2 -2 1 -o 1.1,1.2,2.2,1.3 file1 file2

输出：

pid01,cat01,鼠标,10 pid02,cat01,鼠标,10 pid03,cat01,鼠标,20 pid04,cat02,猫,30 pid05,cat02,猫,20 pid06,cat03,鱼,30

【讨论】：

不错，我总是喜欢这个工具。

【解决方案3】：

您可以像这样创建一个 shell 脚本 (process_csv.sh)：

#!/bin/sh

data=`cat file1.csv | sed -n '/pid/,$ p'`
data2=`cat file2.csv`
echo "product_id, category_id, price, category_name" > final.csv
#since category_id is common in both files, we lookup category names based on that id.
for row in $data
            do
                    cat_id=`printf $row | awk -F "," '{print $2'}`
                    category_name=`printf "$data2" | grep "$cat_id" | cut -f2 -d','`
                    #now we write category_name to file and append it to row/line with corresponding product_id
                    echo $row","$categor_name >> final.csv


            done

只需运行“./process_csv.sh”，final.csv 文件就会包含您的结果

【讨论】：