如果标题相同，则将一个文件中的多个字段/列替换为另一个文件中的内容答案

【问题标题】：Replace multiple fields/columns in one files with contents from another file if the headers are the same如果标题相同，则将一个文件中的多个字段/列替换为另一个文件中的内容
【发布时间】：2017-02-15 22:01:51
【问题描述】：

我有两个共享相似标题的 CSV 文件：sample_scv_1.csv is::

Transaction_date,Name,Payment_Type,Product
1/2/09 6:17,NA,Mastercard,NA
1/2/09 4:53,NA,Visa,NA
1/2/09 13:08,Nick,Mastercard,NA
1/3/09 14:44,Larry,Visa,Goods
1/4/09 12:56,Tina,Visa,Services
1/4/09 13:19,Harry,Visa,Goods

同样，sample_scv_2.csv 是 ::

Transaction_date,Product,Name
1/2/09 6:17,Goods,Janis
1/2/09 4:53,Services,Nicola
1/2/09 13:08,Materials,Asuman

在这两个文件中，列/字段Transaction_date、Product、Name 很常见，我想替换sample_scv_1.csv 中的字段Name，前提是两个文件中的交易日期匹配.

这是一个玩具示例，我的文件很大。对于此示例，我可以将列相等的情况分开，并使用索引替换 csvtool 为：

head -4 sample_scv_1.csv > temp1.csv
tail -3 sample_scv_1.csv > temp1_1.csv
#sudo apt-get install csvtool
csvtool pastecol 2,4 3,2 temp1.csv sample_scv_2.csv > temp1_2.txt
cat temp1_2.txt temp1_1.csv > sample_scv_1.csv

我需要的输出是 ::

Transaction_date,Name,Payment_Type,Product
1/2/09 6:17,Janis,Mastercard,Goods
1/2/09 4:53,Nicola,Visa,Services
1/2/09 13:08,Asuman,Mastercard,Materials
1/3/09 14:44,Larry,Visa,Goods
1/4/09 12:56,Tina,Visa,Services
1/4/09 13:19,Harry,Visa,Goods

我可以确定交易日期匹配到哪一行，但我不知道两列重叠的索引：比如第一个文件中的名称和产品。一个问题很简单，因为sample_scv_2.csv 的所有列都在sample_scv_1.csv 中。任何有效地做到这一点的方法。

【问题讨论】：

请告诉我们what you have tried。我们这里的大多数人都很乐意帮助你提高你的手艺，但作为短期无偿编程人员不太乐意。在MCVE 中向我们展示您迄今为止的工作、您所期望的结果以及您尝试自己解决此问题所获得的结果，我们将帮助您解决问题。
[你的]文件[是？]大有多大？
@ghoti：谢谢。但是，我已经展示了我在上面的 csvtool 中尝试过的示例。为简洁起见，我没有提到其他人。
@JamesBrown ：我的数据大约有 350 列和 500k 行。
两个文件大小一样？

标签： csv unix awk replace

【解决方案1】：

由于文件不大于列或字段较少的文件适合内存，因此 awk 中的解决方案：

$ cat program.awk
BEGIN {FS=OFS=","}         # set the file separators
NR==FNR {                  # for the first file
    p[$1]=$2               # store the product, use date as key
    n[$1]=$3               # name
    next                   # no more processing for the first file
} 
$1 in p {                  # if date found in first processed file
    if($2=="NA") $2=n[$1]  # replace NA with name
    if($4=="NA") $4=p[$1]  # replace NA with product
} 1                        # print the record

运行它：

awk -f program.awk file2 file1
Transaction_date,Name,Payment_Type,Product
1/2/09 6:17 Janis Mastercard Goods
1/2/09 4:53 Nicola Visa Services
1/2/09 13:08 Nick Mastercard Materials
1/3/09 14:44,Larry,Visa,Goods
1/4/09 12:56,Tina,Visa,Services
1/4/09 13:19,Harry,Visa,Goods

【讨论】：

谢谢！我想替换所有内容，不仅是有 NA 的情况。另外，该解决方案假设我们知道需要替换的索引，但这在具有 350 列的文件中并不困难。可以概括吗？
您可以将file2 中的每个值存储到内存中，并使用它来替换file1 中的字段。您需要知道索引以匹配记录。我主要使用awk，魔法不那么频繁。必须有一些东西可以比较。