查找对列表进行排序的替换答案

【问题标题】：Find a substitution that sorts the list查找对列表进行排序的替换
【发布时间】：2017-04-09 03:12:30
【问题描述】：

考虑以下词语：

PINEAPPLE
BANANA
ARTICHOKE
TOMATO

目标是在不移动单词本身的情况下对其进行排序（按字典顺序），而是使用字母替换。在这个例子中，我可以将字母 P 替换为 A 并将 A 替换为 P，所以：

AINEPAALE
BPNPNP
PRTICHOKE
TOMPTO

这是一个按字典顺序排列的列表。如果切换字母，则所有单词中的字母都会被切换。值得注意的是，您可以使用整个字母表，只需注意列表中单词中的字母。

我花了相当多的时间来解决这个问题，但除了暴力破解（尝试所有字母开关组合）之外，我想不出任何其他方法，也无法想出定义列表何时可以使用的条件进行排序。

更多示例：

ABC
ABB
ABD

可以变成

ACB
ACC
ACD

满足条件。

【问题讨论】：

标签： algorithm encryption computer-science letter

【解决方案1】：

提取列表中每个单词的所有首字母。 (P,B,A,T)
对列表进行排序。 (A,B,P,T)
用排序列表中的第一个字符替换单词中所有出现的第一个字母。

将所有单词中的 P(Pineapple) 替换为 A。

将所有单词中的 B 替换为 B。

将所有单词中的 A 替换为 P。

将所有单词中的 T 替换为 T。

这将为您提供预期的结果。

编辑：

比较两个相邻的字符串。如果一个大于另一个，则查找第一次出现的字符不匹配并将所有单词交换并替换为交换的字符。
像冒泡排序一样对整个列表重复此操作。

例子-

ABC

字符不匹配的第一次出现在第 3 位。所以我们把所有的 C 换成 B。

【讨论】：

考虑列表 [ABC, ABB, ABD]。你的方法只解决了第一个字符，而不是整个单词。
@FigsHigs 编辑后的答案适用于所有字符串。
我会试着把它变成代码。另一个问题：你怎么知道列表可以排序？（即如果甚至存在这样的替代）
只有当所有字符串比较值都返回 0 即字符串相等时，您才能在列表的第一次迭代中知道。或者当一个字符串小于（返回 -1）下一个字符串值时。如果整个列表都出现这种情况，则该列表已经排序。
这是完成状态，但如果它无法排序呢？我怎么知道？

【解决方案2】：

更新：正如Eric Zhang 所指出的那样，原始分析在某些类别的测试用例上是错误的并且失败了。

我相信这可以通过topological sort 的形式解决。您的初始单词列表定义了某些字母集的偏序或有向图。您希望找到一个使该字母图线性化的替换。让我们使用您的一个重要示例：

P A R K O V I S T E
P A R A D O N T O Z A
P A D A K
A B B A
A B E C E D A
A B S I N T

让x <* y 表示substitution(x) < substitution(y) 用于某些字母（或单词）x 和y。我们想要word1 <* word2 <* word3 <* word4 <* word5 <* word6 整体，但是在字母方面，我们只需要查看每一对相邻的单词，并在同一列中找到第一对不同的字符：

K <* A  (from PAR[K]OVISTE <* PAR[A]DONTOZA)
R <* D  (from PA[R]ADONTOZA <* PA[D]AK)
P <* A  (from [P]ADAK <* [A]BBA)
B <* E  (from AB[B]A <* AB[E]CEDA)
E <* S  (from AB[E]CEDA <* AB[S]INT)

如果我们没有发现不匹配的字母，那么有 3 种情况：

单词1和单词2是一样的
单词 1 是单词 2 的前缀
单词2是单词1的前缀

在情况 1 和 2 中，单词已经按字典顺序排列，因此我们不需要执行任何替换（尽管我们可能会这样做），并且它们不会添加我们需要遵守的额外约束。在案例 3 中，根本没有替代品可以解决这个问题（想想["DOGGO", "DOG"]），所以没有可能的解决方案，我们可以提前退出。

否则，我们构建对应于我们获得的偏序信息的有向图并执行拓扑排序。如果排序过程表明不可能进行线性化，那么就没有对单词列表进行排序的解决方案。否则，你会得到类似的结果：

P <* K <* R <* B <* E <* A <* D <* S

根据您实现拓扑排序的方式，您可能会得到不同的线性排序。现在您只需为每个字母分配一个符合此顺序且本身按字母顺序排序的替换。一个简单的选择是将线性排序与按字母顺序排序的自身配对，并将其用作替换：

P <* K <* R <* B <* E <* A <* D <* S
|    |    |    |    |    |    |    |
A <  B <  D <  E <  K <  P <  R <  S

但如果您愿意，您可以实施不同的替换规则。

这是 Python 中的概念验证：

import collections
import itertools

# a pair of outgoing and incoming edges
Edges = collections.namedtuple('Edges', 'outgoing incoming')
# a mapping from nodes to edges
Graph = lambda: collections.defaultdict(lambda: Edges(set(), set()))

def substitution_sort(words):
    graph = build_graph(words)

    if graph is None:
        return None

    ordering = toposort(graph)

    if ordering is None:
        return None

    # create a substitition that respects `ordering`
    substitutions = dict(zip(ordering, sorted(ordering)))

    # apply substititions
    return [
        ''.join(substitutions.get(char, char) for char in word)
        for word in words
    ]

def build_graph(words):
    graph = Graph()

    # loop over every pair of adjacent words and find the first
    # pair of corresponding characters where they differ
    for word1, word2 in zip(words, words[1:]):
        for char1, char2 in zip(word1, word2):
            if char1 != char2:
                break
        else: # no differing characters found...

            if len(word1) > len(word2):
                # ...but word2 is a prefix of word1 and comes after;
                # therefore, no solution is possible
                return None
            else:
                # ...so no new information to add to the graph
                continue

        # add edge from char1 -> char2 to the graph
        graph[char1].outgoing.add(char2)
        graph[char2].incoming.add(char1)

    return graph

def toposort(graph):
    "Kahn's algorithm; returns None if graph contains a cycle"
    result = []
    working_set = {node for node, edges in graph.items() if not edges.incoming}

    while working_set:
        node = working_set.pop()
        result.append(node)
        outgoing = graph[node].outgoing

        while outgoing:
            neighbour = outgoing.pop()
            neighbour_incoming = graph[neighbour].incoming
            neighbour_incoming.remove(node)

            if not neighbour_incoming:
                working_set.add(neighbour)

    if any(edges.incoming or edges.outgoing for edges in graph.values()):
        return None
    else:
        return result

def print_all(items):
    for item in items:
        print(item)
    print()

def test():    
    test_cases = [
        ('PINEAPPLE BANANA ARTICHOKE TOMATO', True),
        ('ABC ABB ABD', True),
        ('AB AA AB', False),
        ('PARKOVISTE PARADONTOZA PADAK ABBA ABECEDA ABSINT', True),
        ('AA AB CA', True),
        ('DOG DOGGO DOG DIG BAT BAD', False),
        ('DOG DOG DOGGO DIG BIG BAD', True),
    ]

    for words, is_sortable in test_cases:
        words = words.split()
        print_all(words)

        subbed = substitution_sort(words)

        if subbed is not None:
            assert subbed == sorted(subbed), subbed
            print_all(subbed)
        else:
            print('<no solution>')
            print()

        print('expected solution?', 'yes' if is_sortable else 'no')
        print()

if __name__ == '__main__':
    test()

现在，它并不理想——例如，即使原始单词列表已经排序，它仍然会执行替换——但它似乎可以工作。我无法正式证明它有效，所以如果你找到反例，请告诉我！

【讨论】：

感谢您详尽的回答。
您对测试用例 AA AB CA 的回答失败。见repl.it/E762/0

【解决方案3】：

让我们假设这个问题在特定情况下是可能的，就目前而言。此外，为简单起见，假设所有单词都是不同的（如果两个单词相同，则它们必须相邻并且可以忽略一个）。

然后问题变成了拓扑排序，尽管细节与可疑狗的答案略有不同，后者遗漏了几个案例。

考虑一个由 26 个节点组成的图，标记为 A 到 Z。每对单词为偏序贡献一个有向边；这对应于单词不同的第一个字符。比如ABCEF和ABRKS这两个词的顺序，第一个区别在第三个字符，所以sigma(C) < sigma(R)。

可以通过对该图进行拓扑排序得到结果，并将A替换为排序中的第一个节点，B替换为第二个，等等。

请注意，这也提供了一个有用的衡量标准，即问题何时无法解决。当两个词相同但不相邻（在“簇”中），当一个词是另一个词的前缀但在它之后，或者当图有一个循环并且拓扑排序是不可能的时，就会发生这种情况。

这是一个功能齐全的 Python 解决方案，可以检测问题的特定实例何时无法解决。

def topoSort(N, adj):
    stack = []
    visited = [False for _ in range(N)]
    current = [False for _ in range(N)]

    def dfs(v):
        if current[v]: return False # there's a cycle!
        if visited[v]: return True
        visited[v] = current[v] = True
        for x in adj[v]:
            if not dfs(x):
                return False
        current[v] = False
        stack.append(v)
        return True

    for i in range(N):
        if not visited[i]:
            if not dfs(i):
                return None

    return list(reversed(stack))

def solve(wordlist):
    N = 26
    adj = [set([]) for _ in range(N)] # adjacency list
    for w1, w2 in zip(wordlist[:-1], wordlist[1:]):
        idx = 0
        while idx < len(w1) and idx < len(w2):
            if w1[idx] != w2[idx]: break
            idx += 1
        else:
            # no differences found between the words
            if len(w1) > len(w2):
                return None
            continue

        c1, c2 = w1[idx], w2[idx]
        # we want c1 < c2 after the substitution
        adj[ord(c1) - ord('A')].add(ord(c2) - ord('A'))

    li = topoSort(N, adj)
    sub = {}
    for i in range(N):
        sub[chr(ord('A') + li[i])] = chr(ord('A') + i)
    return sub

def main():
    words = ['PINEAPPLE', 'BANANA', 'ARTICHOKE', 'TOMATO']
    print('Before: ' + ' '.join(words))
    sub = solve(words)
    nwords = [''.join(sub[c] for c in w) for w in words]
    print('After : ' + ' '.join(nwords))

if __name__ == '__main__':
    main()

编辑：这个解决方案的时间复杂度是可证明最优的O(S)，其中S 是输入的长度。感谢suspicious dog为此；原来的时间复杂度是O(N^2 L)。

【讨论】：

非常感谢反例和指正！我自己也想不通。是否有必要将每个单词与之后的每个单词进行比较，或者将每个单词与下一个相邻单词进行比较也有效？有没有失败的例子？
是的，没错！这就是O(NL) 时间算法。不过，实现起来稍微复杂一些。
你不能用单个循环for i in range(len(wordlist) - 1)替换嵌套的i，j循环并使用w1, w2 = wordlist[i], wordlist[i+1]，还是我误解了？
抱歉，忽略最后一条评论。是的，你完全正确！这是显着提高时间复杂度的一种非常简单的方法。