除了上述答案之外,自下而上的 LR 解析器类中的各个解析器之间的区别在于,它们在生成解析表时是否会导致移位/归约或归约/归约冲突。冲突越少,语法就越强大 (LR(0)
例如,考虑以下表达式语法:
E → E + T
E → T
T→F
T → T * F
F → (E)
F → id
不是 LR(0),而是 SLR(1)。使用以下代码,我们可以构建 LR0 自动机并构建解析表(我们需要扩充语法,计算带有闭包的 DFA,计算动作集和 goto 集):
from copy import deepcopy
import pandas as pd
def update_items(I, C):
if len(I) == 0:
return C
for nt in C:
Int = I.get(nt, [])
for r in C.get(nt, []):
if not r in Int:
Int.append(r)
I[nt] = Int
return I
def compute_action_goto(I, I0, sym, NTs):
#I0 = deepcopy(I0)
I1 = {}
for NT in I:
C = {}
for r in I[NT]:
r = r.copy()
ix = r.index('.')
#if ix == len(r)-1: # reduce step
if ix >= len(r)-1 or r[ix+1] != sym:
continue
r[ix:ix+2] = r[ix:ix+2][::-1] # read the next symbol sym
C = compute_closure(r, I0, NTs)
cnt = C.get(NT, [])
if not r in cnt:
cnt.append(r)
C[NT] = cnt
I1 = update_items(I1, C)
return I1
def construct_LR0_automaton(G, NTs, Ts):
I0 = get_start_state(G, NTs, Ts)
I = deepcopy(I0)
queue = [0]
states2items = {0: I}
items2states = {str(to_str(I)):0}
parse_table = {}
cur = 0
while len(queue) > 0:
id = queue.pop(0)
I = states[id]
# compute goto set for non-terminals
for NT in NTs:
I1 = compute_action_goto(I, I0, NT, NTs)
if len(I1) > 0:
state = str(to_str(I1))
if not state in statess:
cur += 1
queue.append(cur)
states2items[cur] = I1
items2states[state] = cur
parse_table[id, NT] = cur
else:
parse_table[id, NT] = items2states[state]
# compute actions for terminals similarly
# ... ... ...
return states2items, items2states, parse_table
states, statess, parse_table = construct_LR0_automaton(G, NTs, Ts)
文法G、非终结符和终结符定义如下
G = {}
NTs = ['E', 'T', 'F']
Ts = {'+', '*', '(', ')', 'id'}
G['E'] = [['E', '+', 'T'], ['T']]
G['T'] = [['T', '*', 'F'], ['F']]
G['F'] = [['(', 'E', ')'], ['id']]
以下是我为 LR(0) 解析表生成实现的一些更有用的功能:
def augment(G, S): # start symbol S
G[S + '1'] = [[S, '$']]
NTs.append(S + '1')
return G, NTs
def compute_closure(r, G, NTs):
S = {}
queue = [r]
seen = []
while len(queue) > 0:
r = queue.pop(0)
seen.append(r)
ix = r.index('.') + 1
if ix < len(r) and r[ix] in NTs:
S[r[ix]] = G[r[ix]]
for rr in G[r[ix]]:
if not rr in seen:
queue.append(rr)
return S
下图(展开查看)展示了使用上述代码为语法构造的LR0 DFA:
下表显示了作为pandas数据帧生成的LR(0)解析表,注意有几个移位/归约冲突,表明语法不是LR(0)。
SLR(1) 解析器通过仅在下一个输入标记是被归约的非终结符的跟随集的成员时归约来避免上述移位/归约冲突。 SLR生成如下解析表:
下面的动画展示了输入表达式是如何被上述 SLR(1) 语法解析的:
问题中的语法也不是 LR(0):
#S --> Aa | bAc | dc | bda
#A --> d
G = {}
NTs = ['S', 'A']
Ts = {'a', 'b', 'c', 'd'}
G['S'] = [['A', 'a'], ['b', 'A', 'c'], ['d', 'c'], ['b', 'd', 'a']]
G['A'] = [['d']]
从下一个LR0 DFA和解析表可以看出:
又出现了 shift/reduce 冲突:
但是,以下接受 a^ncb^n, n >= 1 形式的字符串的语法是 LR(0):
A → a A b
A→c
S→A
# S --> A
# A --> a A b | c
G = {}
NTs = ['S', 'A']
Ts = {'a', 'b', 'c'}
G['S'] = [['A']]
G['A'] = [['a', 'A', 'b'], ['c']]
从下图可以看出,生成的解析表没有冲突。
下面是输入字符串a^2cb^2如何使用上面的LR(0)解析表解析,代码如下:
def parse(input, parse_table, rules):
input = 'aaacbbb$'
stack = [0]
df = pd.DataFrame(columns=['stack', 'input', 'action'])
i, accepted = 0, False
while i < len(input):
state = stack[-1]
char = input[i]
action = parse_table.loc[parse_table.states == state, char].values[0]
if action[0] == 's': # shift
stack.append(char)
stack.append(int(action[-1]))
i += 1
elif action[0] == 'r': # reduce
r = rules[int(action[-1])]
l, r = r['l'], r['r']
char = ''
for j in range(2*len(r)):
s = stack.pop()
if type(s) != int:
char = s + char
if char == r:
goto = parse_table.loc[parse_table.states == stack[-1], l].values[0]
stack.append(l)
stack.append(int(goto[-1]))
elif action == 'acc': # accept
accepted = True
df2 = {'stack': ''.join(map(str, stack)), 'input': input[i:], 'action': action}
df = df.append(df2, ignore_index = True)
if accepted:
break
return df
parse(input, parse_table, rules)
下一个动画展示了输入字符串a^2cb^2是如何使用上面的代码通过LR(0)解析器解析的: