根据多个匹配项将列从一个文件添加到另一个文件，同时保留不匹配项答案

【问题标题】：Add column from one file to another based on multiple matches while retaining unmatched根据多个匹配项将列从一个文件添加到另一个文件，同时保留不匹配项
【发布时间】：2020-04-25 11:16:23
【问题描述】：

所以我真的对这种东西很陌生（说真的，提前抱歉）但我想我会发布这个问题，因为我需要一些时间来解决它，我敢肯定这比我想象的要困难得多。

我有文件 small.csv：

id,name,x,y,id2
1,john,2,6,13
2,bob,3,4,15
3,jane,5,6,17
4,cindy,1,4,18

还有另一个文件 big.csv：

id3,id4,name,x,y
100,{},john,2,6
101,{},bob,3,4
102,{},jane,5,6
103,{},cindy,1,4
104,{},alice,7,8
105,{},jane,0,3
106,{},cindy,1,7

问题是我试图将 small.csv 的 id2 放入 big.csv 的 id4 列，前提是 name AND x AND y 匹配。我曾尝试在 Git Bash 中使用不同的 awk 和 join 命令，但效果不佳。我再次为所有这一切的新手观点感到抱歉，但任何帮助都会很棒。提前谢谢你。

编辑：抱歉，最终所需的输出应该是这样的：

id3,id4,name,x,y
100,{13},john,2,6
101,{15},bob,3,4
102,{17},jane,5,6
103,{18},cindy,1,4
104,{},alice,7,8
105,{},jane,0,3
106,{},cindy,1,7

我最近进行的一项试验是：

$ join -j 1 -o 1.5,2.1,2.2,2.3,2.4,2.5 <(sort -k2 small.csv) <(sort -k2 big.csv)

但是我收到了这个错误：

join: /dev/fd/63: No such file or directory

【问题讨论】：

请将问题更新为 a) 您目前尝试过的代码，b) 您正在生成的（不正确的）输出，以及 c) 所需的输出
@jhnc 非常感谢，这非常接近！你介意解释一下这是如何工作的吗？再说一次，我真的很抱歉这么新手。

标签： bash join awk git-bash

【解决方案1】：

用join 解决可能并不简单，但用awk 解决起来相当容易：

awk -F, -v OFS=, ' # set input and output field separators to comma

    # create lookup table from lines of small.csv
    NR==FNR {
        # ignore header
        # map columns 2/3/4 to column 5
        if (NR>1) lut[$2,$3,$4] = $5
        next
    }

    # process lines of big.csv
    # if lookup table has mapping for columns 3/4/5, update column 2
    v = lut[$3,$4,$5] {
        $2 = "{" v "}"
    }

    # print (possibly-modified) lines of big.csv
    1

' small.csv big.csv >bignew.csv

代码假定 small.csv 仅包含 2/3/4 的每个不同列的一行。

NR==FNR { ...; next } 是一种处理第一个文件参数内容的方法。（在处理来自第二个和后续文件参数的行时，FNR 小于 NR。next 跳过执行剩余的 awk 命令。）

【讨论】：

哇，这是完美的解释。非常感谢您的帮助和速度。