【问题标题】:Text filtering with shell scripting使用 shell 脚本进行文本过滤
【发布时间】:2016-06-01 08:21:03
【问题描述】:

我真的发现很难使用 shell 脚本过滤一些文本。 基本上,我正在登录几个网络设备,并找到它们直接连接的邻居。然后我将这些结果导出到一个 .txt 文件中,如下所示:

    Host IP: 175.334.2.43

-------------------------
Device ID: first_device
Entry address(es):
  IP address: 323.43.5.32
Platform: cisco 428,  Capabilities: Router Switch IGMP
Interface: GigabitEthernet0/3,  Port ID (outgoing port): GigabitEthernet0/10
Holdtime : 130 sec


advertisement version: 2
Protocol Hello:  OUI=0x0fsdfs0C, Protocol ID=0x0fdf2; payload len=27, value=0dsgfjhb2CAE00FF0000
VTP Management Domain: ''
Native VLAN: 453
Duplex: full
Management address(es):
  IP address: 323.43.5.32

-------------------------
Device ID: second_device
Entry address(es):
  IP address: 323.43.5.398
Platform: cisco 428,  Capabilities: Router Switch IGMP
Interface: GigabitEthernet0/5,  Port ID (outgoing port): GigabitEthernet0/123
Holdtime : 130 sec


advertisement version: 2
Protocol Hello:  OUI=0x0fsdfs0C, Protocol ID=0x0fdf2; payload len=27, value=0dsgfjhb2CAE00FF0000
VTP Management Domain: ''
Native VLAN: 453
Duplex: full
Management address(es):
  IP address: 323.43.5.398

Host IP: 342.52.5.2

-------------------------
Device ID: third_device
Entry address(es):
  IP address: 32.43.15.32
Platform: cisco 428,  Capabilities: Router Switch IGMP
Interface: GigabitEthernet0/98,  Port ID (outgoing port): GigabitEthernet0/165
Holdtime : 130 sec


advertisement version: 2
Protocol Hello:  OUI=0x0fsdfs0C, Protocol ID=0x0fdf2; payload len=27, value=0dsgfjhb2CAE00FF0000
VTP Management Domain: ''
Native VLAN: 453
Duplex: full
Management address(es):
  IP address: 32.43.15.32

-------------------------
Device ID: fourth_device
Entry address(es):
  IP address: 0832.54.254.6
Platform: cisco 428,  Capabilities: Router Switch IGMP
Interface: GigabitEthernet0/543,  Port ID (outgoing port): GigabitEthernet0/16
Holdtime : 130 sec


advertisement version: 2
Protocol Hello:  OUI=0x0fsdfs0C, Protocol ID=0x0fdf2; payload len=27, value=0dsgfjhb2CAE00FF0000
VTP Management Domain: ''
Native VLAN: 453
Duplex: full
Management address(es):
  IP address: 0832.54.254.6

我想过滤此文件并按列组织它。我使用 filter_res.sh 脚本执行此操作:

#!/bin/bash
sed -e '/Management address(es):/{N;d;}' results.txt >results2.txt
grep "Host IP:" results2.txt | awk  '{print $3}' >host_ip.txt
grep "Device ID:.*" results2.txt | awk '{print $3 ","}' >dev_ids.txt
grep "IP address: " results2.txt | awk '{print $3 ","}' >cpe_ip.txt
grep "Platform: " results2.txt | awk '{print $2 $3}' >chassis.txt
grep "Interface:" results2.txt >interfaces.txt
awk '{print $7}' interfaces.txt >cpe_int.txt
awk '{print $2}' interfaces.txt >agg_int.txt
pr -mts' ' dev_ids.txt cpe_ip.txt chassis.txt agg_int.txt cpe_int.txt >final_results.txt

final_results.txt 是可以的,除了我想在末尾附加最后一列,每行都有 host_ip。这是我得到的结果:

first_device, 323.43.5.32, cisco428, GigabitEthernet0/3, GigabitEthernet0/10
second_device, 323.43.5.398, cisco428, GigabitEthernet0/5, GigabitEthernet0/123
third_device, 32.43.15.32, cisco428, GigabitEthernet0/98, GigabitEthernet0/165
fourth_device, 0832.54.254.6, cisco428, GigabitEthernet0/543, GigabitEthernet0/16

我想要的是:

first_device, 323.43.5.32, cisco428, GigabitEthernet0/3, GigabitEthernet0/10, 175.334.2.43
second_device, 323.43.5.398, cisco428, GigabitEthernet0/5, GigabitEthernet0/123, 175.334.2.43
third_device, 32.43.15.32, cisco428, GigabitEthernet0/98, GigabitEthernet0/165, 342.52.5.2
fourth_device, 0832.54.254.6, cisco428, GigabitEthernet0/543, GigabitEthernet0/16, 342.52.5.2

【问题讨论】:

  • host_ip.txt 添加到pr 命令的列文件列表时发生了什么?
  • 如果我添加它,主机 IP 不会出现在正确的位置它们应该为每个连接的设备重复。
  • I am really finding it hard to filter some text using shell scripting. - 这很完美,因为这不是 shell 脚本的用途。 Shell 脚本用于操作文件和进程以及对工具的调用排序。 awk 脚本用于操作文本,因此在 UNIX 中,如果您需要过滤(或以其他方式操作)文本,shell 只需调用 awk 即可。

标签: shell text awk scripting grep


【解决方案1】:

您不需要所有这些中间步骤,而是将它们组合在一个 awk 脚本中。这是一种 hacky 方法,不建议长期使用,但也许您可以将其用作起点...

$ awk -v RS="[-]+\n" -v c=',' '
              NR>1{print $3 c,$8 c,$10$11,$17,$22 c,hip} 
        /Host IP:/{hip=$NF}' file


first_device, 323.43.5.32, cisco428, GigabitEthernet0/3, GigabitEthernet0/10, 175.334.2.43
second_device, 323.43.5.398, cisco428, GigabitEthernet0/5, GigabitEthernet0/123, 175.334.2.43
third_device, 32.43.15.32, cisco428, GigabitEthernet0/98, GigabitEthernet0/165, 342.52.5.2
fourth_device, 0832.54.254.6, cisco428, GigabitEthernet0/543, GigabitEthernet0/16, 342.52.5.2

ps。由于多字符 RS 规范,需要 gawk

【讨论】:

  • 非常感谢!这正是我想要做的!
  • 如果它解决了您的问题,请接受这个答案。
  • 不需要将-放在括号表达式中,它在正则表达式中没有特殊含义,除非它放在括号表达式中。由于多字符 RS,您应该提到它是 gawk 特有的。
  • @Ed Morton,是的,我知道,但 -+ 看起来很有趣。 [-] 是有效的正则表达式,因为它没有指定范围。添加了 gawk only 注释。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2010-10-12
  • 2017-07-15
  • 2010-11-15
  • 2011-02-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多