通过一次更改、插入或删除一个字符将一个单词转换为另一个单词答案

【问题标题】：Transform one word into another by changing, inserting, or deleting one character at a time通过一次更改、插入或删除一个字符将一个单词转换为另一个单词
【发布时间】：2012-04-04 08:19:22
【问题描述】：

给定一个有限的单词字典和一个起始对（例如下面示例中的“hands”和“feet”），找到最短的单词序列，使得序列中的任何单词都可以由它的任何一个组成1) 插入一个字符，2) 删除一个字符，或 3) 更改一个字符。

手-> 手-> 和-> 结束-> 防御-> 饲料-> 脚

对于那些可能想知道的人 - 这不是分配给我的家庭作业问题，也不是我在面试中被问到的问题；这只是我感兴趣的一个问题。

我正在寻找一个或两句话的“自上而下的视图”来说明您将采取什么方法 - 以及大胆的，任何语言的有效实现。

【问题讨论】：

这已经被问过很多次了 - 参见例如。 stackoverflow.com/questions/2205540/… 或 stackoverflow.com/questions/1521958/… 或 stackoverflow.com/questions/7729666/…
@blueraja 这些示例都没有包含字母插入或删除规则。（尽管它们非常相似）。
包含该规则不会改变解决方案。
@BlueRaja-DannyPflughoeft：我认为你必须改变解决方案。以这部分为例：if length(w1) != length(w2)Not possible to convert。
@WolframH：您将其视为一个图形，其中单词是节点，如果每个单词彼此之间的“编辑距离”在 1 个“编辑距离”内，则每个单词都由一条边连接。然后在这两种情况下，您只需使用最短路径算法。唯一（极其细微的）区别在于创建边缘。

标签： algorithm

【解决方案1】：

不要将字典变成完整的图表，而是使用结构少一点的东西：

对于字典中的每个word，您可以通过删除len(word) 中每个i 的字符号i 得到一个shortened_word。将(shortened_word, i) 对映射到所有words 的列表。

这有助于查找所有带有一个替换字母的单词（因为对于某些 i，它们必须在同一个 (shortened_word, i) bin 中，而对于多一个字母的单词（因为它们必须在某些 (word, i) bin 中） i.

Python 代码：

from collections import defaultdict, deque
from itertools import chain

def shortened_words(word):
    for i in range(len(word)):
        yield word[:i] + word[i + 1:], i


def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)
    return g


def walk_graph(g, d, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        for next_word in chain(same_length, one_longer, one_shorter):
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

及用法：

dictionary = ispell_dict # list of 47158 words

graph = prepare_graph(dictionary)
print(" -> ".join(walk_graph(graph, dictionary, "hands", "feet")))
print(" -> ".join(walk_graph(graph, dictionary, "brain", "game")))

输出：

hands -> bands -> bends -> bents -> beets -> beet -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game

关于速度的一句话：构建“图形助手”很快（1 秒），但手 -> 脚需要 14 秒，大脑 --> 游戏需要 7 秒。

编辑：如果您需要更快的速度，可以尝试使用图表或网络库。或者您实际上构建了完整的图表（慢），然后更快地找到路径。这主要包括将边缘查找从步行功能移动到图形构建功能：

def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)

    next_words = {}
    for word in d:
        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        next_words[word] = set(chain(same_length, one_longer, one_shorter))
        next_words[word].remove(word)

    return next_words


def walk_graph(g, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        for next_word in g[word]:
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

用法：首先构建图表（慢，某些 i5 笔记本电脑上的所有时序，YMMV）。

dictionary = ispell_dict # list of 47158 words
graph = prepare_graph(dictionary)  # more than 6 minutes!

现在找到路径（比以前快得多，无需打印）：

print(" -> ".join(walk_graph(graph, "hands", "feet")))          # 10 ms
print(" -> ".join(walk_graph(graph, "brain", "game")))          #  6 ms
print(" -> ".join(walk_graph(graph, "tampering", "crunchier"))) # 25 ms

输出：

hands -> lands -> lends -> lens -> lees -> fees -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game
tampering -> tapering -> capering -> catering -> watering -> wavering -> havering -> hovering -> lovering -> levering -> leering -> peering -> peeping -> seeping -> seeing -> sewing -> swing -> swings -> sings -> sines -> pines -> panes -> paces -> peaces -> peaches -> beaches -> benches -> bunches -> brunches -> crunches -> cruncher -> crunchier

【讨论】：

很酷地看到一个工作 python 实现。有没有办法提高图形行走步骤的性能？可以利用的标准 python 图形库？
@gcbenison：我不太了解图形库，无法获得合格的答案。也许我的更新（真正计算图形，即 looong 设置）有足够快的查找速度？
我不了解 Python，但在 C 中有 igraph 库。我在this implementation 中使用了它，我描述了here。我不是 Python 专家，但我认为构建图表的方法与您的非常相似。
@gcbenison：看了你的描述，是的，好像基本一样。我敢肯定你的实现要快得多...

【解决方案2】：

快速回答。您可以计算Levenshtein distance，即大多数动态编程文本中的“常见”编辑距离，并从生成的计算表中尝试构建该路径。

来自维基百科链接：

d[i, j] := minimum
               (
                 d[i-1, j] + 1,  // a deletion
                 d[i, j-1] + 1,  // an insertion
                 d[i-1, j-1] + 1 // a substitution
               )

您可以记下这些发生在您的代码中的时间（可能是在某个辅助表中），当然，从那里重新构建解决方案路径会很容易。

【讨论】：

【解决方案3】：

一种天真的方法可能是将字典变成一个图形，将单词作为节点，边连接“邻居”（即可以通过一次操作将单词相互转换）。然后你可以使用最短路径算法来找到单词 A 和单词 B 之间的距离。

这种方法的难点在于找到一种有效地将字典转换为图表的方法。

【讨论】：