比较 .txt 和 .csv 文件，需要将 .csv 文件中的匹配名称替换为 .txt答案

【问题标题】：Compare a .txt and .csv file and need to replace with matching name in .csv file to .txt比较 .txt 和 .csv 文件，需要将 .csv 文件中的匹配名称替换为 .txt
【发布时间】：2017-06-24 08:55:42
【问题描述】：

file1.txt

[fields:WinSpc:defect]
a=b
b=c
hello=hi

[fields:ROCKET PROJECT:ticket]
description=Descrtiption
status=status

[fields:PROJECT_Nexus:defect]
title=summary
priority=Priority_hello

file2.csv

WinSpc,projects.winspc
ROCKET PROJECT,projects.rocket_project
PROJECT_Nexus,projects.project-nexus

我需要匹配这两个文件，所需的输出是：

输出.txt

[fields:winspc:defect]
a=b
b=c
hello=hi
[fields:rocket_project:ticket]
description=Descrtiption
status=status
[fields:project-nexus:defect]
title=summary
priority=Priority_hello

只是名字应该改一下，

我尝试过使用

grep -Fwf, diff --breif,

和 awk 选项，但没有得到想要的输出。还在学习这些东西。任何建议都会非常有帮助。提前致谢。

【问题讨论】：

你想做什么？看，您的解决方案只是在 file1.csv 文件上调用 .lower() 以获得结果...但这不是您想要的，对吧？
@math2001 只是拼写错误，所以我要找的是。我需要将 file1.txt 与 ile2.csv 进行比较，并且期望的输出应该是 output.txt，正如所问的那样，我唯一要修改和更改的是，如果您在 file1.txt 中看到名称“WinSpc”应该更改为winspc..这是要求
@SubratSahoo：我还不清楚，如果你只是想让WinSpc变成winspc，为什么还需要另一个.csv文件？
@SubratSahoo：为什么ROCKET PROJECT 转换为rocket_project 和PROJECT_Nexus 转换为project-nexus？为什么有区别？背后的想法是什么
@Inian 是的......花了一个小时的讨论来假设 OP 想要什么（我仍然不确定）。定义不明确的问题会引起麻烦。

标签： python shell csv awk

【解决方案1】：

一个更具可扩展性的Awk 逻辑可以通过以下方式完成。

重申对未来读者的要求，.csv 文件有一个 field,replacement-of-field¹ 对存储在多行中。对于.csv 中的所有field，.txt 文件中的相应条目应替换为replacement-of-field

1. replcement-of-field 实际上只涉及到点之后的部分

以下命令按预期完成工作。

awk 'FNR==NR{split($2,list,"."); replacement[$1]=list[2]; next} \
   {for (i in replacement){ if (match($0,i)) {gsub(i,replacement[i],$0); break} }}1 ' \
      FS="," file2.csv file1.txt

根据需要生成OP 的输出，

[fields:winspc:defect]
a=b
b=c
hello=hi
[fields:rocket_project:ticket]
description=Descrtiption
status=status
[fields:project-nexus:defect]
title=summary
priority=Priority_hello

稍微解释一下，

FNR==NR 逻辑确保{} 中的命令首先为.csv 文件运行。请注意，.csv 文件是使用字段分隔符 , 读取的
split($2,list,".");replacement[$1]=list[2]; next 确保文件的第二列被. 分割，并创建一个哈希映射，索引设置为要替换的值，并将值作为要替换的实际值。这是针对.csv 文件中的所有行完成的
现在在.txt 文件中，检查每一行以查看要替换的值是否存在，如果存在，则将其替换为替换值。

【讨论】：

你试过了吗？看上去不错。一个小评论。在 OPs cmets 看来，csv 文件不需要整个 replacement-of-field，而只是这个替换的最后一部分（点之后）。
我也试过了，效果很好！ AWK 是相当不错的工具。
@inian 这工作正常，对不起我的做法，但我有一些完美的字符串名称，无需替换。例如，rocket_project 是正确的，不需要做任何事情......所以当我在多件事上尝试这个时，这会给出错误：(FILENAME=FieldMappingF.txt FNR=1) fatal: Invalid range end: /Sensor: SEN200/300 （是 MIST）[AMS-LOCK]/
@SubratSahoo：提供没有问题的输入。
@Inian 我已经编辑了这个问题，所以现在你可以看到不需要替换的“abl_tja1146”并且它在 file1.txt 中出现多次所以你的建议在这里有效吗?

【解决方案2】：

一个sed单线：

sed 's#,projects.#/#;s#.*#/fields/s/&/\;#' file2.csv | sed -f - file1.txt

它是如何工作的：

将 file2.csv 转换为 sed substitute 命令。所以初始代码
sed 's#,projects.#/#;s#.*#/fields/s/&/\;#' file2.csv
输出：
```
/fields/s/WinSpc/winspc/;
/fields/s/ROCKET PROJECT/rocket_project/;
/fields/s/PROJECT_Nexus/project-nexus/;
```

在 file1.txt 上运行生成的 substitute 命令。

输出：

[fields:winspc:defect]
a=b
b=c
hello=hi
[fields:rocket_project:ticket]
description=Descrtiption
status=status
[fields:project-nexus:defect]
title=summary
priority=Priority_hello

【讨论】：

【解决方案3】：

考虑到您的 cmets，这看起来是一个练习，可以使用“字段”值是一种键的另一个文件中的替换值替换 txt 文件中 fields: 后面的值。

看看这个方法：

$ readarray -t a < <(grep -e "\[fields:" a.txt |cut -d: -f2)
$ for ((i=0;i<${#a[@]};i++));do a[i]=s/${a[i]}/$(grep -e "${a[i]}" b.txt |cut -d, -f2 |cut -d. -f2)/g\;;done
$ sed -f <(echo "${a[@]}") a.txt

输出：

[fields:winspc:defect]
a=b
b=c
hello=hi
[fields:rocket_project:ticket]
description=Descrtiption
status=status
[fields:project-nexus:defect]
title=summary
priority=Priority_hello

解释：

# grep the first file a.txt for all the fields: and keep the second part, i.e WinScp . Store all those findings in an array
$ readarray -t a < <(grep -e "\[fields:" a.txt |cut -d: -f2)
$ declare -p a #print the array to see what is inside
#Bash Output: declare -a a=([0]="WinSpc" [1]="ROCKET PROJECT" [2]="PROJECT_Nexus")

# Iterate through the array and with the stored value (i.e WinSpc) grep the second file 
# (b.txt in my test) and get the second field after comma. Store changes in the same array.
$ for ((i=0;i<${#a[@]};i++));do a[i]=s/${a[i]}/$(grep -e "${a[i]}" b.txt |cut -d, -f2 |cut -d. -f2)/g\;;done
$ declare -p a #print the array again. Now array looks like a sed pattern.
# Bash Output: declare -a a=([0]="s/WinSpc/winspc/g;" [1]="s/ROCKET PROJECT/rocket_project/g;" [2]="s/PROJECT_Nexus/project-nexus/g;")

# We can then apply all the sed patterns stored in array to replace values of text file (a.txt)
$ sed -f <(echo "${a[@]}") a.txt

使用 AWK 等的其他解决方案可能会提供更高效的代码。

【讨论】：

解决方案应该是可扩展的，设计为在 OP 在 file1 或 file2 中添加新条目时工作，如果有新条目进入，此设计将不起作用。这需要手动更改才能使其工作！
@Inian 什么是错误？我不明白你的意思。我们从文本文件中提取[fields:，我们使用txt文件的[fields:之后的值在csv文件中查找并获取替换部分。然后我们用 sed 替换......我错过了什么？
我没有完整地完成设计，但是像declare -a a=([0]="s/WinSpc/projects.winspc/g;" [1]="s/ROCKET PROJECT/projects.rocket_project/g;" [2]="s/PROJECT_Nexus/projects.project-nexus/g;") 这样的硬编码字段将只限制有问题的 OP 输入的答案。我的意思是添加更多此类字段不起作用，而不是答案的有效性。
@Inian 不，这不是硬编码。它只是 declare -p a 的输出。我要求 bash 告诉我 a 持有什么数组。这是您提出 delcare -p 命令时的默认 bash 响应。
@Inian 试试这个看看我在说什么：a=( 1 2 3 );declare -p a。您将看到 bash 将响应数组 a 的所有详细信息（值、格式等）。实际上 bash 响应将是 declare -a a=([0]="1" [1]="2" [2]="3")