【问题标题】:How do I combine 2 csv files + all content + cygwin/bash/awk/sed/paste如何组合 2 个 csv 文件 + 所有内容 + cygwin/bash/awk/sed/paste
【发布时间】:2014-07-25 19:10:45
【问题描述】:

如何合并 2 个 CSV 文件(finle1.csv 和 file2.csv)?我已经探索了 awk/sed/paste,但它超出了我的范围。

file1.csv

Time,Object,Integrity,KPI 1-A Name A unit(unit/s),KPI 2-A Name B unit(unit/s)
2014-06-04 11:00,ObjectA,100%,0.0316,0.0012
2014-06-04 21:00,ObjectB,100%,40.0332,7.2601

file2.csv

Time,Object,Integrity,KPI 1-C Name A unit(unit),KPI 1-D Name A unit(unit)
2014-06-04,ObjectA,100%,0.024,0.0014
2014-06-04,ObjectB,100%,60.6176,29.0913

whatIwant.csv

Time,Object,Integrity,KPI 1-A Name A unit(unit/s),KPI 2-A Name B unit(unit/s),Time,Object,Integrity,KPI 1-C Name A unit(unit),KPI 1-D Name A unit(unit)
2014-06-04 11:00,ObjectA,100%,0.0316,0.0012,2014-06-04,ObjectA,100%,0.024,0.0014
2014-06-04 21:00,ObjectB,100%,40.0332,7.2601,2014-06-04,ObjectB,100%,60.6176,29.0913

注意:我假设时间,对象会相应地排列。

这将用于 N 行。

每个文件中的列数也可能会增加。

我可能必须从whatIwant.csv 中删除第二次时间、对象、完整性列,但可以稍后再执行。

【问题讨论】:

  • 我建议您看一下en.wikibooks.org/wiki/…,这是您需要做的速成课程
  • tks,也发现了这个way
  • 您也可以使用join 代替paste 来删除重复的字段。或者解析输出以将其删除。

标签: bash csv awk sed merge


【解决方案1】:

使用awk

awk -F, 'NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2],$4, $5 }' OFS=, file1.csv file2.csv
Time,Object,Integrity,KPI 1-A Name A unit(unit/s),KPI 2-A Name B unit(unit/s),KPI 1-C Name A unit(unit),KPI 1-D Name A unit(unit)
2014-06-04 11:00,ObjectA,100%,0.0316,0.0012,0.024,0.0014
2014-06-04 21:00,ObjectB,100%,40.0332,7.2601,60.6176,29.0913

使用join

join -t, -j 2 -o 1.1 1.2 1.3 1.4 1.5 2.4 2.5 file1.csv file2.csv
Time,Object,Integrity,KPI 1-A Name A unit(unit/s),KPI 2-A Name B unit(unit/s),KPI 1-C Name A unit(unit),KPI 1-D Name A unit(unit)
2014-06-04 11:00,ObjectA,100%,0.0316,0.0012,0.024,0.0014
2014-06-04 21:00,ObjectB,100%,40.0332,7.2601,60.6176,29.0913

更新:

要加入日期和对象,您可以使用awk,因为join 仅加入 1 列。

awk -F, 'NR==FNR{sub(/ .*/,"",$1);map[$1,$2]=$0;next}(($1,$2) in map){print map[$1,$2],$4,$5}' OFS=, f1 f2
Time,Object,Integrity,KPI 1-A Name A unit(unit/s),KPI 2-A Name B unit(unit/s),KPI 1-C Name A unit(unit),KPI 1-D Name A unit(unit)
2014-06-04,ObjectA,100%,0.0316,0.0012,0.024,0.0014
2014-06-04,ObjectB,100%,40.0332,7.2601,60.6176,29.0913

【讨论】:

  • 非常感谢,但您能简要解释一下吗?因为我想将它应用于看起来更像这样的类似数据,所有值都在双引号"Time","Object","Integrity","KPI 1-A Name A unit(unit/s)","KPI 2-A Name B unit(unit/s)"
  • @HattrickNZ 对于引用的 CSV,我根本不推荐 awk。使用适当的 CSV 解析器附带的东西,例如 perl 或 ruby​​。
猜你喜欢
  • 2017-12-23
  • 2022-10-04
  • 2015-02-20
  • 1970-01-01
  • 1970-01-01
  • 2018-07-21
  • 1970-01-01
  • 1970-01-01
  • 2015-08-01
相关资源
最近更新 更多