更新:正如Eric Zhang 所指出的那样,原始分析在某些类别的测试用例上是错误的并且失败了。
我相信这可以通过topological sort 的形式解决。您的初始单词列表定义了某些字母集的偏序或有向图。您希望找到一个使该字母图线性化的替换。让我们使用您的一个重要示例:
P A R K O V I S T E
P A R A D O N T O Z A
P A D A K
A B B A
A B E C E D A
A B S I N T
让x <* y 表示substitution(x) < substitution(y) 用于某些字母(或单词)x 和y。我们想要word1 <* word2 <* word3 <* word4 <* word5 <* word6 整体,但是在字母方面,我们只需要查看每一对相邻的单词,并在同一列中找到第一对不同的字符:
K <* A (from PAR[K]OVISTE <* PAR[A]DONTOZA)
R <* D (from PA[R]ADONTOZA <* PA[D]AK)
P <* A (from [P]ADAK <* [A]BBA)
B <* E (from AB[B]A <* AB[E]CEDA)
E <* S (from AB[E]CEDA <* AB[S]INT)
如果我们没有发现不匹配的字母,那么有 3 种情况:
- 单词1和单词2是一样的
- 单词 1 是单词 2 的前缀
- 单词2是单词1的前缀
在情况 1 和 2 中,单词已经按字典顺序排列,因此我们不需要执行任何替换(尽管我们可能会这样做),并且它们不会添加我们需要遵守的额外约束。在案例 3 中,根本没有替代品可以解决这个问题(想想["DOGGO", "DOG"]),所以没有可能的解决方案,我们可以提前退出。
否则,我们构建对应于我们获得的偏序信息的有向图并执行拓扑排序。如果排序过程表明不可能进行线性化,那么就没有对单词列表进行排序的解决方案。否则,你会得到类似的结果:
P <* K <* R <* B <* E <* A <* D <* S
根据您实现拓扑排序的方式,您可能会得到不同的线性排序。现在您只需为每个字母分配一个符合此顺序且本身按字母顺序排序的替换。一个简单的选择是将线性排序与按字母顺序排序的自身配对,并将其用作替换:
P <* K <* R <* B <* E <* A <* D <* S
| | | | | | | |
A < B < D < E < K < P < R < S
但如果您愿意,您可以实施不同的替换规则。
这是 Python 中的概念验证:
import collections
import itertools
# a pair of outgoing and incoming edges
Edges = collections.namedtuple('Edges', 'outgoing incoming')
# a mapping from nodes to edges
Graph = lambda: collections.defaultdict(lambda: Edges(set(), set()))
def substitution_sort(words):
graph = build_graph(words)
if graph is None:
return None
ordering = toposort(graph)
if ordering is None:
return None
# create a substitition that respects `ordering`
substitutions = dict(zip(ordering, sorted(ordering)))
# apply substititions
return [
''.join(substitutions.get(char, char) for char in word)
for word in words
]
def build_graph(words):
graph = Graph()
# loop over every pair of adjacent words and find the first
# pair of corresponding characters where they differ
for word1, word2 in zip(words, words[1:]):
for char1, char2 in zip(word1, word2):
if char1 != char2:
break
else: # no differing characters found...
if len(word1) > len(word2):
# ...but word2 is a prefix of word1 and comes after;
# therefore, no solution is possible
return None
else:
# ...so no new information to add to the graph
continue
# add edge from char1 -> char2 to the graph
graph[char1].outgoing.add(char2)
graph[char2].incoming.add(char1)
return graph
def toposort(graph):
"Kahn's algorithm; returns None if graph contains a cycle"
result = []
working_set = {node for node, edges in graph.items() if not edges.incoming}
while working_set:
node = working_set.pop()
result.append(node)
outgoing = graph[node].outgoing
while outgoing:
neighbour = outgoing.pop()
neighbour_incoming = graph[neighbour].incoming
neighbour_incoming.remove(node)
if not neighbour_incoming:
working_set.add(neighbour)
if any(edges.incoming or edges.outgoing for edges in graph.values()):
return None
else:
return result
def print_all(items):
for item in items:
print(item)
print()
def test():
test_cases = [
('PINEAPPLE BANANA ARTICHOKE TOMATO', True),
('ABC ABB ABD', True),
('AB AA AB', False),
('PARKOVISTE PARADONTOZA PADAK ABBA ABECEDA ABSINT', True),
('AA AB CA', True),
('DOG DOGGO DOG DIG BAT BAD', False),
('DOG DOG DOGGO DIG BIG BAD', True),
]
for words, is_sortable in test_cases:
words = words.split()
print_all(words)
subbed = substitution_sort(words)
if subbed is not None:
assert subbed == sorted(subbed), subbed
print_all(subbed)
else:
print('<no solution>')
print()
print('expected solution?', 'yes' if is_sortable else 'no')
print()
if __name__ == '__main__':
test()
现在,它并不理想——例如,即使原始单词列表已经排序,它仍然会执行替换——但它似乎可以工作。我无法正式证明它有效,所以如果你找到反例,请告诉我!