【问题标题】:select/match a Section in Python在 Python 中选择/匹配一个部分
【发布时间】:2019-06-10 10:30:37
【问题描述】:

我正在尝试使用以下 python 中的正则表达式匹配 OSPF 数据库中任何链接状态类型的每个部分,如下面的 CLI_Output 所示:

regex = r'\n\n(\s+\S+( \S+)?(.+?)\n\n)(\s+\S+( \S+)?)?'
section = re.findall(regex,_original_result, re.M)

但我在标题行之后只得到(第一行)一行

i.e.
                Router Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2

这里是我的 CLI 输出:

CLI_Output = '''
                Router Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2
10.200.254.252  10.200.254.252  97          0x80000003 0x00501E 3

                Net Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.254.242  10.189.254.242  1452       0x80001cf4 0xefab
10.189.0.242    10.189.0.242    1452       0x80001cf4 0xefab

                Summary Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum     Route
10.189.127.0    10.189.254.242  10         0x80001cde 0x6602     10.189.127.0/29
10.200.0.0      10.200.254.251  130        0x80000001 0x002675   10.200.0.0/16
172.18.200.1    10.200.254.251  109        0x80000001 0x00B5CB   172.18.200.1/32

                ASBR-Summary Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.127.3    10.189.254.242  10         0x80001c30 0xc14a

                Router Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.127.3    10.189.127.3    1707       0x80001d5e 0xa509   1
10.189.254.242  10.189.254.242  10         0x80001ce0 0x8ec2   1

                Net Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.127.2    10.189.254.243  70         0x80001c31 0xdb72

                Summary Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum    Route
10.189.254.240  10.189.254.242  371        0x80001cda 0x8a71     10.189.254.240/29
10.189.254.240  10.189.254.243  1813       0x80001cda 0x8476     10.189.254.240/29

                ASBR-Summary Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.7.250    10.189.254.242  1442       0x8000154f 0x165e
10.189.7.250    10.189.254.243  1242       0x8000154d 0x1461

                Router Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2
10.189.254.243  10.189.254.243  1552       0x80001ce8 0x164e   1

                Net Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age        Seq#       Checksum
10.200.254.241  10.200.254.251  1277 80000001 ef90  0002

                Summary Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age        Seq#       Checksum     Route
0.0.0.0         10.200.254.251  1317 80000001 b7b0  0002 0.0.0.0/0
0.0.0.0         10.200.254.252  1317 80000001 b1b5  0002 0.0.0.0/0

                NSSA-external Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age  Seq#     CkSum Flag Route         Tag
10.200.1.0      172.18.200.1    365  800011cb 6f90  0031 E2 10.200.1.0/24   0
10.200.2.0      172.18.200.1    1735 800011c7 6c96  0031 E2 10.200.2.0/24   0
10.200.3.0      172.18.200.1    1775 800011c9 5da2  0031 E2 10.200.3.0/24   0

                AS External Link States

Link ID         ADV Router      Age  Seq#     CkSum Flag Route         Tag
0.0.0.0         10.189.7.250    384  800129e9 9a51  0012 E2 0.0.0.0/0       0
2.3.4.0         10.189.7.250    1154 80007a7a 1fe2  0012 E2 2.3.4.0/24      0
10.112.0.0      10.189.7.250    1084 8000d7e3 b31d  0012 E2 10.112.0.0/21   0

有人可以帮我吗,我的正则表达式应该如何寻找完整的部分?

i.e.
                Router Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2
10.200.254.252  10.200.254.252  97          0x80000003 0x00501E 3

非常感谢提前 矩阵154

【问题讨论】:

  • 您的正则表达式似乎与您所描述的不匹配 (see here)。除此之外,我认为你最好逐行阅读。
  • @Jerry:你是对的!我已经尝试了很多组合但没有运气。使用此正则表达式,仅匹配类型行。有没有办法匹配对应的部分?
  • 有可能,但我认为这不值得=/如果您逐行阅读,应该可以轻松得多。

标签: python regex match multiline findall


【解决方案1】:

我用 perl 和 php 编写了许多代码,并且是 python 新手。 perl 或 php 中的部分选择更容易,所以我想在 python 中做同样的事情,但不幸的是不成功

@Jerry:非常感谢你!我按照您的建议逐行阅读 CLI 输出。这是我希望在这里分享的代码。也许其他人也需要它

import re
g = globals()

def ParseText(_original_result):
  ls_types = []
  dic = {}
  (ls_name, area_id, area_type) = ('', '', '')

  for line in _original_result.split("\n"):
    if (line.strip() and not re.search('Link ID\s+', line)):
      regex = r'^\s+(\S+|\S+ \S+) Link States'
      ls = re.findall(regex, line, re.S)
      if ls: ls_types.append(re.sub('(\s|-)', '_', ls[0]))

  ls_types = list(set(ls_types))

  for line in _original_result.split("\n"):
    if (line.strip() and not re.search('Link ID\s+', line)):
      matcher = list(re.finditer(r'^\s+(?P<ls_name>\S+( \S+)?) Link States( \(Area (?P<area_id>\S+)( \[(?P<area_type>\S+)\])?\))?', line, re.S|re.M))
      if matcher:
        ls_name = re.sub('(\s|-)', '_', matcher[0]['ls_name'])
        if not ls_name in g:
          g[ls_name] = { 'area_id': [], 'area_type': [], 'link_id': [], 'adv_rtr': [], 'age': [], 'sumary': [], 'ext_route_type': [], 'ext_route': [], 'tag': [], '$_columns' : ['area_id', 'area_type', 'link_id', 'adv_rtr', 'age', 'sumary', 'ext_route_type', 'ext_route', 'tag'] }
        area_id = matcher[0]['area_id']
        if matcher[0]['area_type']:
          area_type = matcher[0]['area_type']
        else:
          if area_id == '0.0.0.0': area_type = 'backbone'
          else: area_type = 'normal'
      else:
        matcher = list(re.finditer(r'^(?P<lnk_id>\S+)\s+(?P<adv_rtr>\S+)\s+(?P<age>\d+)(\s+\S+){2}(\s+(\d+\s+)?((?P<sumary>\S+)|(?P<ext_route_type>\S+)\s+(?P<ext_route>\S+)\s+(?P<tag>\d+)))?$', line, re.M|re.S))
        if matcher:
          if ls_name != 'AS_External':
            g[ls_name]['area_id'].append(area_id)
            g[ls_name]['area_type'].append(area_type)
          g[ls_name]['link_id'].append(matcher[0]['lnk_id'])
          g[ls_name]['adv_rtr'].append(matcher[0]['adv_rtr'])
          g[ls_name]['age'].append(matcher[0]['age'])
          if matcher[0]['sumary']:
            g[ls_name]['sumary'].append(matcher[0]['sumary'])
          if matcher[0]['ext_route_type']:
            g[ls_name]['ext_route_type'].append(matcher[0]['ext_route_type'])
          if matcher[0]['ext_route']:
            g[ls_name]['ext_route'].append(matcher[0]['ext_route'])
          if matcher[0]['tag']:
            g[ls_name]['tag'].append(matcher[0]['tag'])

  for LS in g:
    if LS in ls_types:
      dic[LS]= g[LS]     

  return dic

CLI_Output = '''
Forinet $get router info ospf database brief



                Router Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2
10.189.254.242  10.189.254.242  371        0x80001ce0 0x2847   1
10.189.254.243  10.189.254.243  1552       0x80001ce8 0x164e   1
10.200.254.251  10.200.254.251  93          0x80000003 0x002052 3
10.200.254.252  10.200.254.252  97          0x80000003 0x00501E 3

                Net Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.254.242  10.189.254.242  1452       0x80001cf4 0xefab

                Summary Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum     Route
10.189.127.0    10.189.254.242  10         0x80001cde 0x6602     10.189.127.0/29
10.189.127.0    10.189.254.243  1452       0x80001cdc 0x6405     10.189.127.0/29
10.200.0.0      10.200.254.251  130        0x80000001 0x002675   10.200.0.0/16
10.200.0.0      10.200.254.252  146        0x80000001 0x00207A   10.200.0.0/16
172.18.200.1    10.200.254.251  109        0x80000001 0x00B5CB   172.18.200.1/32
172.18.200.1    10.200.254.252  108        0x80000001 0x00AFD0   172.18.200.1/32

                ASBR-Summary Link States (Area 0.0.0.0)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.127.3    10.189.254.242  10         0x80001c30 0xc14a
10.189.127.3    10.189.254.243  60         0x80001c4b 0x856a

                Router Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.127.3    10.189.127.3    1707       0x80001d5e 0xa509   1
10.189.254.242  10.189.254.242  10         0x80001ce0 0x8ec2   1
10.189.254.243  10.189.254.243  70         0x80001ce6 0x80c7   1

                Net Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.127.2    10.189.254.243  70         0x80001c31 0xdb72

                Summary Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum    Route
10.189.254.240  10.189.254.242  371        0x80001cda 0x8a71     10.189.254.240/29
10.189.254.240  10.189.254.243  1813       0x80001cda 0x8476     10.189.254.240/29
10.200.254.250  10.189.254.242  1442       0x80001548 0x0673     10.189.254.250/32
10.200.254.250  10.189.254.243  1242       0x80001548 0xff78     10.189.254.250/32

                ASBR-Summary Link States (Area 1.1.1.1)

Link ID         ADV Router      Age        Seq#       Checksum
10.189.7.250    10.189.254.242  1442       0x8000154f 0x165e
10.189.7.250    10.189.254.243  1242       0x8000154d 0x1461

                Router Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age        Seq#       Checksum Link Count
10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2
10.189.254.242  10.189.254.242  371        0x80001ce0 0x2847   1
10.189.254.243  10.189.254.243  1552       0x80001ce8 0x164e   1

                Net Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age        Seq#       Checksum
10.200.254.241  10.200.254.251  1277 80000001 ef90  0002

                Summary Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age        Seq#       Checksum     Route
0.0.0.0         10.200.254.251  1317 80000001 b7b0  0002 0.0.0.0/0
0.0.0.0         10.200.254.252  1317 80000001 b1b5  0002 0.0.0.0/0

                NSSA-external Link States (Area 2.2.2.2 [NSSA])

Link ID         ADV Router      Age  Seq#     CkSum Flag Route              Tag
10.200.1.0      172.18.200.1    365  800011cb 6f90  0031 E2 10.200.1.0/24   0
10.200.2.0      172.18.200.1    1735 800011c7 6c96  0031 E2 10.200.2.0/24   0
10.200.3.0      172.18.200.1    1775 800011c9 5da2  0031 E2 10.200.3.0/24   0
10.200.4.0      172.18.200.1    1555 800011c9 43be  0031 E2 10.200.4.0/22   0
10.200.8.0      172.18.200.1    1585 800011c8 28d3  0031 E2 10.200.8.0/24   0
10.200.234.0    172.18.200.1    1525 800011c7 6aaf  0031 E2 10.200.234.0/24 0

                AS External Link States

Link ID         ADV Router      Age  Seq#     CkSum Flag Route              Tag
0.0.0.0         10.189.7.250    384  800129e9 9a51  0012 E2 0.0.0.0/0       0
2.3.4.0         10.189.7.250    1154 80007a7a 1fe2  0012 E2 2.3.4.0/24      0
10.112.0.0      10.189.7.250    1084 8000d7e3 b31d  0012 E2 10.112.0.0/21   0
10.112.189.0    10.189.7.250    144  8000e95e 84fa  0012 E2 10.112.189.0/24 0
10.158.189.0    10.189.7.250    124  800129db 9df5  0012 E2 10.158.189.0/24 0
10.180.128.0    10.189.7.250    1264 800129da 15ad  0012 E2 10.180.128.0/21 0
10.188.0.0      10.189.7.250    1314 800129d5 2b4d  0012 E2 10.188.0.0/18   0
10.189.0.0      10.189.7.250    1344 800129d8 320a  0012 E2 10.189.0.0/21   0
10.189.8.0      10.189.7.250    1504 8000d057 0801  0012 E2 10.189.8.0/23   0
10.189.10.0     10.189.7.250    334  800129da e246  0012 E2 10.189.10.0/24  0
10.189.11.0     10.189.7.250    1534 800129da d750  0012 E2 10.189.11.0/24  0
10.189.14.0     10.189.7.250    1204 800129e5 a079  0012 E2 10.189.14.0/24  0
10.189.15.0     10.189.7.250    784  8000c59c 2ca1  0012 E2 10.189.15.0/29  0
10.189.20.0     10.189.7.250    914  800129e0 68b0  0012 E2 10.189.20.0/24  0

'''

print (ParseText(CLI_Output))

输出看起来像这样(apendig 的顺序是粘性的,所以不用担心重新排序问题:

{
  'Router': {
    'area_id': ['0.0.0.0', '0.0.0.0', '0.0.0.0', '0.0.0.0', '0.0.0.0', '1.1.1.1', '1.1.1.1', '1.1.1.1', '2.2.2.2', '2.2.2.2', '2.2.2.2'],
    'area_type': ['backbone', 'backbone', 'backbone', 'backbone', 'backbone', 'normal', 'normal', 'normal', 'NSSA', 'NSSA', 'NSSA'],
    'link_id': ['10.189.7.250', '10.189.254.242', '10.189.254.243', '10.200.254.251', '10.200.254.252', '10.189.127.3', '10.189.254.242', '10.189.254.243', '10.189.7.250', '10.189.254.242', '10.189.254.243'],
    'adv_rtr': ['10.189.7.250', '10.189.254.242', '10.189.254.243', '10.200.254.251', '10.200.254.252', '10.189.127.3', '10.189.254.242', '10.189.254.243', '10.189.7.250', '10.189.254.242', '10.189.254.243'],
    'age': ['1102', '371', '1552', '93', '97', '1707', '10', '70', '1102', '371', '1552'],
    'sumary': ['2', '1', '1', '3', '3', '1', '1', '1', '2', '1', '1'],
    'ext_route_type': [],
    'ext_route': [],
    'tag': [],
    '$_columns': ['area_id', 'area_type', 'link_id', 'adv_rtr', 'age', 'sumary', 'ext_route_type', 'ext_route', 'tag']
  },
  'Net': {
    'area_id': ['0.0.0.0', '1.1.1.1', '2.2.2.2'],
    'area_type': ['backbone', 'normal', 'NSSA'], 'link_id':['10.189.254.242', '10.189.127.2', '10.200.254.241'],
    'adv_rtr': ['10.189.254.242', '10.189.254.243', '10.200.254.251'],
    'age': ['1452', '70', '1277'],
    'sumary': ['0002'],
    'ext_route_type': [],
    'ext_route': [],
    'tag': [],
    '$_columns': ['area_id', 'area_type', 'link_id', 'adv_rtr', 'age', 'sumary', 'ext_route_type', 'ext_route', 'tag']
  },
  'Summary': {
    'area_id': ['0.0.0.0', '0.0.0.0', '0.0.0.0', '0.0.0.0', '0.0.0.0', '0.0.0.0', '1.1.1.1', '1.1.1.1', '1.1.1.1', '1.1.1.1', '2.2.2.2', '2.2.2.2'],
    'area_type': ['backbone', 'backbone', 'backbone', 'backbone','backbone', 'backbone', 'normal', 'normal', 'normal', 'normal', 'NSSA', 'NSSA'],
    'link_id': ['10.189.127.0', '10.189.127.0', '10.200.0.0', '10.200.0.0', '172.18.200.1', '172.18.200.1', '10.189.254.240', '10.189.254.240', '10.200.254.250', '10.200.254.250', '0.0.0.0', '0.0.0.0'],
    'adv_rtr': ['10.189.254.242', '10.189.254.243', '10.200.254.251', '10.200.254.252', '10.200.254.251', '10.200.254.252', '10.189.254.242', '10.189.254.243', '10.189.254.242', '10.189.254.243', '10.200.254.251', '10.200.254.252'],
    'age': ['10', '1452', '130', '146', '109', '108', '371', '1813', '1442', '1242', '1317', '1317'],
    'sumary': ['10.189.127.0/29', '10.189.127.0/29', '10.200.0.0/16', '10.200.0.0/16', '172.18.200.1/32', '172.18.200.1/32', '10.189.254.240/29', '10.189.254.240/29', '10.189.254.250/32', '10.189.254.250/32', '0.0.0.0/0', '0.0.0.0/0'],
    'ext_route_type': [],
    'ext_route': [],
    'tag': [],
    '$_columns': ['area_id', 'area_type', 'link_id', 'adv_rtr', 'age', 'sumary', 'ext_route_type', 'ext_route', 'tag']
  },
  'ASBR_Summary': {
    'area_id': ['0.0.0.0', '0.0.0.0', '1.1.1.1', '1.1.1.1'],
    'area_type': ['backbone', 'backbone', 'normal', 'normal'],
    'link_id': ['10.189.127.3', '10.189.127.3', '10.189.7.250', '10.189.7.250'],
    'adv_rtr': ['10.189.254.242', '10.189.254.243', '10.189.254.242', '10.189.254.243'],
    'age': ['10', '60', '1442', '1242'],
    'sumary': [],
    'ext_route_type': [],
    'ext_route': [],
    'tag': [],
    '$_columns': ['area_id', 'area_type', 'link_id', 'adv_rtr', 'age', 'sumary', 'ext_route_type', 'ext_route', 'tag']
  },
  'NSSA_external': {
    'area_id': ['2.2.2.2', '2.2.2.2', '2.2.2.2', '2.2.2.2', '2.2.2.2', '2.2.2.2'],
    'area_type': ['NSSA', 'NSSA', 'NSSA', 'NSSA', 'NSSA', 'NSSA'],
    'link_id': ['10.200.1.0', '10.200.2.0', '10.200.3.0', '10.200.4.0', '10.200.8.0', '10.200.234.0'],
    'adv_rtr': ['172.18.200.1', '172.18.200.1', '172.18.200.1', '172.18.200.1', '172.18.200.1', '172.18.200.1'],
    'age': ['365', '1735', '1775', '1555', '1585', '1525'], 'sumary':[],
    'ext_route_type': ['E2', 'E2', 'E2', 'E2', 'E2', 'E2'],
    'ext_route': ['10.200.1.0/24', '10.200.2.0/24', '10.200.3.0/24', '10.200.4.0/22', '10.200.8.0/24', '10.200.234.0/24'], 'tag':['0', '0', '0', '0', '0', '0'],
    '$_columns': ['area_id', 'area_type', 'link_id', 'adv_rtr', 'age', 'sumary', 'ext_route_type', 'ext_route', 'tag']
  },
  'AS_External': {
    'area_id': [],
    'area_type': [],
    'link_id': ['0.0.0.0', '2.3.4.0', '10.112.0.0', '10.112.189.0', '10.158.189.0', '10.180.128.0', '10.188.0.0', '10.189.0.0', '10.189.8.0', '10.189.10.0', '10.189.11.0', '10.189.14.0', '10.189.15.0', '10.189.20.0'],
    'adv_rtr': ['10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250', '10.189.7.250'],
    'age': ['384', '1154', '1084', '144', '124', '1264', '1314', '1344', '1504', '334', '1534', '1204', '784', '914'],
    'sumary': [],
    'ext_route_type': ['E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2', 'E2'],
    'ext_route': ['0.0.0.0/0', '2.3.4.0/24', '10.112.0.0/21', '10.112.189.0/24', '10.158.189.0/24', '10.180.128.0/21', '10.188.0.0/18', '10.189.0.0/21', '10.189.8.0/23', '10.189.10.0/24', '10.189.11.0/24', '10.189.14.0/24', '10.189.15.0/29', '10.189.20.0/24'],
    'tag': ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0'],
    '$_columns': ['area_id', 'area_type', 'link_id', 'adv_rtr', 'age', 'sumary', 'ext_route_type', 'ext_route', 'tag']
  }
}

【讨论】:

    【解决方案2】:

    就像我在我的 cmets 中提到的,我相信逐行阅读更容易:

    record = 0
    results = []
    
    for line in CLI_Output.split("\n"):
        # skip empty lines
        if line == "" and record < 2:
            continue
    
        # if Router Link is in header
        if line.find('Router Link') > -1:
            record = 1
            continue
    
        # headers
        if record == 1:
            record = 2
            continue
    
        # If we are here, we are getting the data lines
        if record == 2 and line != "":
            results.append(line)
        elif line == "":
            record = 0  # use break here if you want to stop after the first chunk
    
    print(results)
    

    结果:

    ['10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2',
     '10.200.254.252  10.200.254.252  97          0x80000003 0x00501E 3',
     '10.189.127.3    10.189.127.3    1707       0x80001d5e 0xa509   1',
     '10.189.254.242  10.189.254.242  10         0x80001ce0 0x8ec2   1',
     '10.189.7.250    10.189.7.250    1102       0x80012fa1 0x6b32   2',
     '10.189.254.243  10.189.254.243  1552       0x80001ce8 0x164e   1']
    

    codepad demo


    虽然这并不意味着不能使用正则表达式,但它更复杂,例如,如果您不习惯语法,真的很难理解以下内容:

    ^\ +(.+) Router Link States\s*(?:\([^)]+\))?\s+Link ID.+\s+((?:[^\r\n]+[\r\n]?)+)
    

    regex101 demo

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-09-02
      • 1970-01-01
      • 1970-01-01
      • 2017-07-19
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多