比较unix中的文件列？答案

【问题标题】：comparing columns of files in unix?比较unix中的文件列？
【发布时间】：2017-04-09 02:13:51
【问题描述】：

我想比较 Today.txt 和 Main.txt 的文件名。如果匹配，则打印 Main.txt 匹配文件的所有 6 列，并使用新文件 match.txt。

以及与Main.txt不匹配的文件，然后在一个新文件中列出TODAY.txt的文件名和时间，说unmatched.txt

注意：加号（+）表示文件来自inprogress目录，有时文件名会附加“+”。

Main.txt

 date      filename          timestamp space  count   status
Nov 4    +CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

今天.txt

 filename       time
CHCK01_20161104.txt 06:03
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
AR01_20161104.txt   09:36
AR02_20161104.txt   09:36
ifs01_20161104.txt  21:16
TRIPS11_20161104.txt 09:16

所需的输出：匹配的.txt

Nov 4    +CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

unmatched.txt

CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt  21:16

下面的命令给了我正确的输出，除非文件附加了加号（+）。

 awk 'FNR==1{next} 
  NR==FNR{a[$1]=$2; next} 
  $3 in a{print; delete a[$3]} 
      END{for(k in a) print k,a[k] > "unmatched"}' today main > matched

提前非常感谢！

【问题讨论】：

标签： python shell unix awk sed

【解决方案1】：

问题是在main 文件上运行时的行$3 in a。对于要匹配+的字符串值，在GNU awk中可用的操作过程中，在$3上使用gensub。 gensub 相对于gsub 的重要性在于它返回替换值而不是反映文件。因此，将其用于您的案例

$ awk 'FNR==1{next} 
  NR==FNR{a[$1]=$2; next} 
  gensub(/+/,"",1,$3) in a{print; delete a[gensub(/+/,"",1,$3)]} 
      END{for(k in a) print k,a[k] > "unmatched"}' today main 

Nov 4    +CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

根据需要在输出中生成 4 行。

来自gawk 手册页。

gensub(regexp, replacement, how [, target])
           gensub is a general substitution function. Like sub and gsub, it 
searches the target string target for matches of the regular expression regexp. Unlike sub and gsub, 
the modified string is returned as the result of the function, and the original target string
is not changed. If how is a string beginning with `g' or `G', then it replaces all matches 
of regexp with replacement.

因此，在我们的例子中，gensub(/+/,"",1,$3) 将第一次出现的 + 替换为仅从字段开头的空字符串（因为我们已将替换计数设置为 1）。这是为了避免在现场其他任何地方更换。

（或）更简洁的awk 逻辑，感谢Ed Morton's 建议在$3 上使用gsub 并将其存储在变量中

$ awk 'FNR==1{next} 
  NR==FNR{a[$1]=$2; next} 
  {k=$3; sub(/^\+/,"",k)} k in a{print; delete a[k]} 
      END{for(k in a) print k,a[k] > "unmatched"}' today main

【讨论】：

感谢您的帮助，您能告诉我如何更改 TEXT SIZE 吗？，我可以使用以下命令更改颜色和粗体。 if ($NF == "延迟") {color="red";粗体=1； size=15} else if ($NF == "On_Time") color="green" else if ($NF == "No_Records") color="yellow" else color="#003abc" Dummy=$0 sub("[ ^ ]+$","",Dummy) print Dummy "" $NF ""
@Janaranjan：感谢您发现它有帮助，您可以将您的新要求作为单独的问题发布。它值得单独发布而不是评论