【发布时间】:2020-12-21 09:04:41
【问题描述】:
这是一系列区域位置及其在新加坡的各个子区域。
Bishan[1]
Bishan East
Marymount
Upper Thomson
Bukit Merah[2] (Not to be confused with Bukit Merah subzone.)
Alexandra Hill
Alexandra North
Bukit Ho Swee
Bukit Merah (Not to be confused with Bukit Merah planning area.)
City Terminals (Formerly called "Tanjong Pagar" subzone.)
Depot Road
Everton Park
Henderson Hill
Kampong Tiong Bahru
Maritime Square (Formerly called "HarbourFront" subzone.)
Redhill
Singapore General Hospital
Telok Blangah Drive
Telok Blangah Rise
Telok Blangah Way
Tiong Bahru
Tiong Bahru Station
Bukit Timah[3]
Anak Bukit
Coronation Road
Farrer Court
Hillcrest
Holland Road
Leedon Park
Swiss Club
Ulu Pandan
Downtown Core[4]
Anson
Bayfront
Bugis
Cecil
Central
City Hall
Clifford Pier
Marina Centre
Maxwell
Phillip
Raffles Place
Tanjong Pagar
Geylang[5]
Aljunied
Geylang East
Kallang Way
MacPherson
Kampong Ubi
Kallang[6]
Bendemeer
Boon Keng
Crawford
Geylang Bahru
Kallang Bahru
Kampong Bugis
Kampong Java
Lavender
Tanjong Rhu
或者,作为 Python 字符串:
data = 'Bishan[1]\nBishan East\nMarymount\nUpper Thomson\nBukit Merah[2] (Not to be confused with Bukit Merah subzone.)\nAlexandra Hill\nAlexandra North\nBukit Ho Swee\nBukit Merah (Not to be confused with Bukit Merah planning area.)\nCity Terminals (Formerly called "Tanjong Pagar" subzone.)\nDepot Road\nEverton Park\nHenderson Hill\nKampong Tiong Bahru\nMaritime Square (Formerly called "HarbourFront" subzone.)\nRedhill\nSingapore General Hospital\nTelok Blangah Drive\nTelok Blangah Rise\nTelok Blangah Way\nTiong Bahru\nTiong Bahru Station\nBukit Timah[3]\nAnak Bukit\nCoronation Road\nFarrer Court\nHillcrest\nHolland Road\nLeedon Park\nSwiss Club\nUlu Pandan\nDowntown Core[4]\nAnson\nBayfront\nBugis\nCecil\nCentral\nCity Hall\nClifford Pier\nMarina Centre\nMaxwell\nPhillip\nRaffles Place\nTanjong Pagar\nGeylang[5]\nAljunied\nGeylang East\nKallang Way\nMacPherson\nKampong Ubi\nKallang[6]\nBendemeer\nBoon Keng\nCrawford\nGeylang Bahru\nKallang Bahru\nKampong Bugis\nKampong Java\nLavender\nTanjong Rhu\n'
带有square brackets[] 的单词是区域后面跟着由换行符\n 分隔的子区域。我想要做的是创建一个区域列表,其中包含如下子区域的子列表(稍后我将要删除方括号和括号及其内容):
1.)碧山[1]
- Bishan East
- Marymount
- Upper Thomson
2.) Bukit Merah[2](不要与 Bukit Merah 分区混淆。)
- Alexandra Hill
- Alexandra North
- Bukit Ho Swee
- Bukit Merah (Not to be confused with Bukit Merah planning area.)
- City Terminals (Formerly called "Tanjong Pagar" subzone.)
...
到目前为止,我只能使用 split() 和正则表达式提取区域。
zones_and_subzones = data.split('\n')
zones = [zone for zone in zones_and_subzones if re.match(r'(.*?)\[', zone)]
这就是我所困的地方,我在尝试提取每个区域的子区域时遇到了麻烦。我尝试使用
regex = (\].*?\[)
提取右方括号和左方括号之间的文本,但其结果不完整。我已经有一段时间了,希望能得到帮助。如果有比我目前拥有的更好的方法,请分享。谢谢。
【问题讨论】:
-
您确定要查找的是嵌套数组而不是列表字典吗?
-
Regex 不适合解析 HTML 或 XML 等嵌套内容
标签: python regex list nested-lists