【问题标题】:Split a string on each "X"在每个“X”上拆分一个字符串
【发布时间】:2018-08-22 02:05:35
【问题描述】:

我正在尝试在每个 Cell\s\d+splitstring 都没有成功。
我的尝试是:

result = re.split(r"Cell\s\d+ - Address: .*?Cell\s\d+", subject, 0, re.DOTALL | re.MULTILINE)

但它会跳过每 1 条记录,这是有道理的,因为我在正则表达式中包含了下一个匹配项的一部分。 我也尝试过正面/负面的前瞻和后视,但没有运气。
请注意,记录的结尾不同。

如何拆分下面的字符串?

wlan0     Scan completed :
Cell 01 - Address: 00:24:01:B6:4F:E1
Channel:3
Frequency:2.422 GHz (Channel 3)
Quality=70/70  Signal level=-40 dBm
Encryption key:on
ESSID:"DLink-XXXXXX"
Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 6 Mb/s; 9 Mb/s
11 Mb/s; 12 Mb/s; 18 Mb/s
Bit Rates:24 Mb/s; 36 Mb/s; 48 Mb/s; 54 Mb/s
Mode:Master
Extra:tsf=0000000000000000
Extra: Last beacon: 20ms ago
IE: Unknown: 000C444C696E6B2D423634464531
IE: Unknown: 010882848B0C12961824
IE: Unknown: 030103
IE: WPA Version 1
Group Cipher : TKIP
Pairwise Ciphers (1) : TKIP
Authentication Suites (1) : PSK
IE: Unknown: 2A0100
IE: Unknown: 32043048606C
IE: Unknown: DD180050F2020101050003A4000027A4000042435E0062322F00
IE: Unknown: DD0900037F01010000FF7F
IE: Unknown: DD0A00037F04010000000000
Cell 02 - Address: 06:24:01:B6:4F:E1
Channel:3
Frequency:2.422 GHz (Channel 3)
Quality=70/70  Signal level=-39 dBm
Encryption key:on
ESSID:"WIFI_1"
Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 6 Mb/s; 9 Mb/s
11 Mb/s; 12 Mb/s; 18 Mb/s
Bit Rates:24 Mb/s; 36 Mb/s; 48 Mb/s; 54 Mb/s
Mode:Master
Extra:tsf=0000000000000000
Extra: Last beacon: 20ms ago
IE: Unknown: 00015F
IE: Unknown: 010882848B0C12961824
IE: Unknown: 030103
IE: IEEE 802.11i/WPA2 Version 1
Group Cipher : TKIP
Pairwise Ciphers (1) : TKIP
Authentication Suites (1) : PSK
IE: Unknown: 2A0100
IE: Unknown: 32043048606C
IE: Unknown: DD180050F2020101050003A4000027A4000042435E0062322F00
IE: Unknown: DD0900037F01010000FF7F
IE: Unknown: DD0A00037F04010000000000
Cell 03 - Address: BC:4D:FB:4F:C3:B8
Channel:7
Frequency:2.442 GHz (Channel 7)
Quality=69/70  Signal level=-41 dBm
Encryption key:on
ESSID:"WIFI_2"
Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s; 9 Mb/s
18 Mb/s; 36 Mb/s; 54 Mb/s
Bit Rates:6 Mb/s; 12 Mb/s; 24 Mb/s; 48 Mb/s
Mode:Master
Extra:tsf=0000000000000000
Extra: Last beacon: 20ms ago
IE: Unknown: 00083F3F3F3F3F3F3F3F
IE: Unknown: 010882848B961224486C
IE: Unknown: 030107
IE: Unknown: 32048C98B060
IE: Unknown: DD270050F204104A0001101044000102104700102880288028801880A880BC4DFB4FC3B8103C000101
IE: Unknown: 050402030080
IE: Unknown: 2A0100
IE: Unknown: 2D1A8C0116FFFF000000000000000000000000000000000000000000
IE: Unknown: 3D1607000400000000000000000000000000000000000000
IE: Unknown: 7F0101
IE: IEEE 802.11i/WPA2 Version 1
Group Cipher : CCMP
Pairwise Ciphers (1) : CCMP
Authentication Suites (1) : PSK
IE: Unknown: DD180050F2020101800003A4000027A4000042435E0062322F00
IE: Unknown: 0B0506001B127A
IE: Unknown: DD07000C4300000000
Cell 04 - Address: BC:4D:FB:4F:C3:B9
Channel:7
Frequency:2.442 GHz (Channel 7)
Quality=68/70  Signal level=-42 dBm
Encryption key:off
ESSID:"NOS_WIFI_Fon"
Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s; 9 Mb/s
18 Mb/s; 36 Mb/s; 54 Mb/s
Bit Rates:6 Mb/s; 12 Mb/s; 24 Mb/s; 48 Mb/s
Mode:Master
Extra:tsf=0000000000000000
Extra: Last beacon: 20ms ago
IE: Unknown: 000C4E4F535F574946495F466F6E
IE: Unknown: 010882848B961224486C
IE: Unknown: 030107
IE: Unknown: 32048C98B060
IE: Unknown: 050401030000
IE: Unknown: 2A0100
IE: Unknown: 2D1A8C0116FFFF000000000000000000000000000000000000000000
IE: Unknown: 3D1607000400000000000000000000000000000000000000
IE: Unknown: 7F0101
IE: Unknown: DD180050F2020101800003A4000027A4000042435E0062322F00
IE: Unknown: 0B0506001B127A
IE: Unknown: DD07000C4300000000

【问题讨论】:

  • 它通常像r"Cell\s\d+ - Address: .*?(?=Cell\s\d+|\Z)" 一样完成。您甚至可以在前瞻中在Cell 之前添加\n 以使其更安全(如果Cell 位于行首)。
  • 为什么这个标签是awksed?您想要使用这些的解决方案吗?
  • @chrisz 使用awksed 的解决方案也受到欢迎,但@Wiktor 已经成功了。您能否发表您的评论作为答案,以便我接受?

标签: python regex awk sed split


【解决方案1】:

作为替代方法,您可以使用 Python 的 groupby() 函数来发现块,如下所示:

from itertools import groupby

subject = """ --- all the text --- """    # read in, or add text here
lines = iter(subject.splitlines())
data = [list(g) for k, g in groupby(lines, lambda x: x.startswith('Cell '))][1:]
cells = [l1 + l2 for l1, l2 in zip(*[iter(data)] * 2)]

for cell in cells:
    print cell

因此,如果subject 包含文件中的所有文本,您将获得四个单元格,每个单元格都包含一个行列表:

['Cell 01 - Address: 00:24:01:B6:4F:E1', 'Channel:3', 'Frequency:2.422 GHz (Channel 3)', 'Quality=70/70  Signal level=-40 dBm', 'Encryption key:on', 'ESSID:"DLink-XXXXXX"', 'Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 6 Mb/s; 9 Mb/s', '11 Mb/s; 12 Mb/s; 18 Mb/s', 'Bit Rates:24 Mb/s; 36 Mb/s; 48 Mb/s; 54 Mb/s', 'Mode:Master', 'Extra:tsf=0000000000000000', 'Extra: Last beacon: 20ms ago', 'IE: Unknown: 000C444C696E6B2D423634464531', 'IE: Unknown: 010882848B0C12961824', 'IE: Unknown: 030103', 'IE: WPA Version 1', 'Group Cipher : TKIP', 'Pairwise Ciphers (1) : TKIP', 'Authentication Suites (1) : PSK', 'IE: Unknown: 2A0100', 'IE: Unknown: 32043048606C', 'IE: Unknown: DD180050F2020101050003A4000027A4000042435E0062322F00', 'IE: Unknown: DD0900037F01010000FF7F', 'IE: Unknown: DD0A00037F04010000000000']
['Cell 02 - Address: 06:24:01:B6:4F:E1', 'Channel:3', 'Frequency:2.422 GHz (Channel 3)', 'Quality=70/70  Signal level=-39 dBm', 'Encryption key:on', 'ESSID:"WIFI_1"', 'Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 6 Mb/s; 9 Mb/s', '11 Mb/s; 12 Mb/s; 18 Mb/s', 'Bit Rates:24 Mb/s; 36 Mb/s; 48 Mb/s; 54 Mb/s', 'Mode:Master', 'Extra:tsf=0000000000000000', 'Extra: Last beacon: 20ms ago', 'IE: Unknown: 00015F', 'IE: Unknown: 010882848B0C12961824', 'IE: Unknown: 030103', 'IE: IEEE 802.11i/WPA2 Version 1', 'Group Cipher : TKIP', 'Pairwise Ciphers (1) : TKIP', 'Authentication Suites (1) : PSK', 'IE: Unknown: 2A0100', 'IE: Unknown: 32043048606C', 'IE: Unknown: DD180050F2020101050003A4000027A4000042435E0062322F00', 'IE: Unknown: DD0900037F01010000FF7F', 'IE: Unknown: DD0A00037F04010000000000']
['Cell 03 - Address: BC:4D:FB:4F:C3:B8', 'Channel:7', 'Frequency:2.442 GHz (Channel 7)', 'Quality=69/70  Signal level=-41 dBm', 'Encryption key:on', 'ESSID:"WIFI_2"', 'Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s; 9 Mb/s', '18 Mb/s; 36 Mb/s; 54 Mb/s', 'Bit Rates:6 Mb/s; 12 Mb/s; 24 Mb/s; 48 Mb/s', 'Mode:Master', 'Extra:tsf=0000000000000000', 'Extra: Last beacon: 20ms ago', 'IE: Unknown: 00083F3F3F3F3F3F3F3F', 'IE: Unknown: 010882848B961224486C', 'IE: Unknown: 030107', 'IE: Unknown: 32048C98B060', 'IE: Unknown: DD270050F204104A0001101044000102104700102880288028801880A880BC4DFB4FC3B8103C000101', 'IE: Unknown: 050402030080', 'IE: Unknown: 2A0100', 'IE: Unknown: 2D1A8C0116FFFF000000000000000000000000000000000000000000', 'IE: Unknown: 3D1607000400000000000000000000000000000000000000', 'IE: Unknown: 7F0101', 'IE: IEEE 802.11i/WPA2 Version 1', 'Group Cipher : CCMP', 'Pairwise Ciphers (1) : CCMP', 'Authentication Suites (1) : PSK', 'IE: Unknown: DD180050F2020101800003A4000027A4000042435E0062322F00', 'IE: Unknown: 0B0506001B127A', 'IE: Unknown: DD07000C4300000000']
['Cell 04 - Address: BC:4D:FB:4F:C3:B9', 'Channel:7', 'Frequency:2.442 GHz (Channel 7)', 'Quality=68/70  Signal level=-42 dBm', 'Encryption key:off', 'ESSID:"NOS_WIFI_Fon"', 'Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s; 9 Mb/s', '18 Mb/s; 36 Mb/s; 54 Mb/s', 'Bit Rates:6 Mb/s; 12 Mb/s; 24 Mb/s; 48 Mb/s', 'Mode:Master', 'Extra:tsf=0000000000000000', 'Extra: Last beacon: 20ms ago', 'IE: Unknown: 000C4E4F535F574946495F466F6E', 'IE: Unknown: 010882848B961224486C', 'IE: Unknown: 030107', 'IE: Unknown: 32048C98B060', 'IE: Unknown: 050401030000', 'IE: Unknown: 2A0100', 'IE: Unknown: 2D1A8C0116FFFF000000000000000000000000000000000000000000', 'IE: Unknown: 3D1607000400000000000000000000000000000000000000', 'IE: Unknown: 7F0101', 'IE: Unknown: DD180050F2020101800003A4000027A4000042435E0062322F00', 'IE: Unknown: 0B0506001B127A', 'IE: Unknown: DD07000C4300000000']

【讨论】:

猜你喜欢
  • 2014-12-17
  • 1970-01-01
  • 2011-07-09
  • 2022-01-15
  • 1970-01-01
  • 2018-02-20
相关资源
最近更新 更多