【发布时间】:2017-10-19 19:52:42
【问题描述】:
加入.awk
#!/bin/awk -f
BEGIN {
FS=OFS=",";
print "ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avgMeasuredTime,avgSpeed,medianMeasuredTime,Distance between 2 points,duration of measurements,ndt in kmh"
}
NR==FNR && NR>1 {
a[$8]=$1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7
}
FNR>1 {
if ($6 in a) {
split(a[$6],data,FS);
if ((data[6]==$11 || data[6]==$13) && (data[7]==$10 || data[7]==$12)) {
print data[1],data[2],data[3],data[4],data[5],data[6],data[7],$6,$2,$3,$5,$14,$15,$16
}
}
}
我有这段代码,它合并了两个具有 3 个公共列的 csv 文件。我在 stackoverflow 的帮助下得到了这段代码。
输入文件1
ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,lon gitude,latitude,timestamp
101,94,49,44,87,10.1050,56.2317,1406831700
106,97,48,47,86,10.1050,56.2317,1406832000
107,95,49,42,85,10.1050,56.2317,1406832300
103,90,51,44,87,10.1050,56.2317,1406832600
输入文件2
status,avgMeasuredTime,avgSpeed,extID,medianMeasuredTime,TIMESTAMP,vehicleCount,_id,REPORT_ID,Lat1,Long1,Lat2,Long2,Distance between 2 points,duration of measurements,ndt in kmh
OK,74,50,668,74,1406831700,5,20746220,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
OK,73,50,668,73,1406859900,6,20746392,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
OK,61,60,668,61,1406832300,4,20746723,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
OK,61,60,668,61,1406860500,1,20747172,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
输出
ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avgMeasuredTime,avgSpeed,medianMeasuredTime,Distance between 2 points,duration of measurements,ndt in kmh
101,94,49,44,87,10.1050,56.2317,1406831700,74,50,74,1030,52,71
107,95,49,42,85,10.1050,56.2317,1406832300,61,60,61,1030,52,71
每个输入文件有 1300000+ 行。
当我运行这个命令时
awk -f join.awk Inputfile1.csv Inputfile2.csv
我只打印了标题。但是此代码适用于较小的文件。 请帮忙
【问题讨论】:
-
命令完成了吗?
-
awk -f join.awk ip1.csv ip2.csv ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avgMeasuredTime,avgSpeed,medianMeasuredTime,Distance between 2 points,duration of measurements,ndt in kmh在此之后停止 -
它是完成还是挂起?
-
是的,先生,它完成了
-
我的老眼睛欺骗了我。不是 10 美元。