正则表达式语言不足以匹配任意嵌套的结构。为此,您需要一个下推自动机(即解析器)。有几个这样的工具可用,例如PLY。
Python 还为自己的语法提供了parser library,它可能会满足您的需求。但是,输出非常详细,需要一段时间才能理解。如果你对这个角度感兴趣,下面的讨论会尽量简单地解释。
>>> import parser, pprint
>>> pprint.pprint(parser.st2list(parser.expr('(((1+0)+1)+1)')))
[258,
[327,
[304,
[305,
[306,
[307,
[308,
[310,
[311,
[312,
[313,
[314,
[315,
[316,
[317,
[318,
[7, '('],
[320,
[304,
[305,
[306,
[307,
[308,
[310,
[311,
[312,
[313,
[314,
[315,
[316,
[317,
[318,
[7, '('],
[320,
[304,
[305,
[306,
[307,
[308,
[310,
[311,
[312,
[313,
[314,
[315,
[316,
[317,
[318,
[7,
'('],
[320,
[304,
[305,
[306,
[307,
[308,
[310,
[311,
[312,
[313,
[314,
[315,
[316,
[317,
[318,
[2,
'1']]]]],
[14,
'+'],
[315,
[316,
[317,
[318,
[2,
'0']]]]]]]]]]]]]]]],
[8,
')']]]]],
[14,
'+'],
[315,
[316,
[317,
[318,
[2,
'1']]]]]]]]]]]]]]]],
[8, ')']]]]],
[14, '+'],
[315,
[316,
[317,
[318, [2, '1']]]]]]]]]]]]]]]],
[8, ')']]]]]]]]]]]]]]]],
[4, ''],
[0, '']]
你可以用这个简短的函数来减轻痛苦:
def shallow(ast):
if not isinstance(ast, list): return ast
if len(ast) == 2: return shallow(ast[1])
return [ast[0]] + [shallow(a) for a in ast[1:]]
>>> pprint.pprint(shallow(parser.st2list(parser.expr('(((1+0)+1)+1)'))))
[258,
[318,
'(',
[314,
[318, '(', [314, [318, '(', [314, '1', '+', '0'], ')'], '+', '1'], ')'],
'+',
'1'],
')'],
'',
'']
数字来自 Python 模块 symbol 和 token,您可以使用它们来构建从数字到名称的查找表:
map = dict(token.tok_name.items() + symbol.sym_name.items())
您甚至可以将此映射折叠到 shallow() 函数中,这样您就可以使用字符串而不是数字:
def shallow(ast):
if not isinstance(ast, list): return ast
if len(ast) == 2: return shallow(ast[1])
return [map[ast[0]]] + [shallow(a) for a in ast[1:]]
>>> pprint.pprint(shallow(parser.st2list(parser.expr('(((1+0)+1)+1)'))))
['eval_input',
['atom',
'(',
['arith_expr',
['atom',
'(',
['arith_expr',
['atom', '(', ['arith_expr', '1', '+', '0'], ')'],
'+',
'1'],
')'],
'+',
'1'],
')'],
'',
'']