【发布时间】:2018-04-13 22:30:42
【问题描述】:
我正在尝试使用 awk 来解析如下所示的文本文件:
001 data John Smith address "London" | occupation "Driver" | exercise_level "Medium"
002 data Rob Edward address "Cardiff" | occupation "Physiotherapist" | exercise_level "High"
003 data Dara Pronk address "Groningen" | country "Holland" | occupation "Teacher" | exercise_level "Low"
004 data Marina Francesca address "Lugano" | country "Switzerland" | occupation "Chef" | exercise_level "High"
前 4 列由制表符分隔,第 5 列有一些元数据由管道分隔。
我想获取职业“key”的“values”作为我的第五列。我想要的输出将如下所示:
001 data John Smith Driver
002 data Rob Edward Physiotherapist
003 data Dara Pronk Teacher
004 data Marina Francesca Chef
我可以通过这个命令得到职业:
awk -F'[\t|]' '{for(i=5;i<=NF;i++){if($i~/^ occupation/){c=$i}} print $1, $2, $3, $4, c}' my_file
但是,它将同时具有关键和价值(例如职业“物理治疗师”而不仅仅是物理治疗师)。有没有办法解析解析的列(即解析引号内的值),如下所示?
awk -F'[\t|]' '{for(i=5;i<=NF;i++){if($i~/^ occupation/){c=$i}} ((parse c here, take $2 of " delimiter)) print $1, $2, $3, $4, c}' my_file
【问题讨论】:
标签: bash shell parsing unix awk