通过有效词将一个词转换为另一个词的算法答案

【问题标题】：Algorithm to transform one word to another through valid words通过有效词将一个词转换为另一个词的算法
【发布时间】：2011-01-13 09:57:42
【问题描述】：

我遇到了edit-distance 问题的这种变体：

设计一种将源词转换为目标词的算法。例如：从头到尾，每一步只能替换一个字符，单词必须有效。你会得到一本字典。

这显然是edit distance 问题的变体，但在编辑距离上，我不关心这个词是否有效。那么如何将这个要求添加到编辑距离。

【问题讨论】：

标签： algorithm string transform

【解决方案1】：

这可以建模为图形问题。您可以将单词视为图形的节点，当且仅当它们的长度相同且一个字符不同时，两个节点才连接。

您可以预处理字典并创建此图，应如下所示：

   stack  jack
    |      |
    |      |
   smack  back -- pack -- pick

然后你可以有一个从单词到代表单词的节点的映射，为此你可以使用哈希表，高度平衡 BST ...

一旦您有了上述映射，您所要做的就是查看两个图节点之间是否存在路径，这可以使用 BFS 或 DFS 轻松完成。

所以你可以将算法总结为：

preprocess the dictionary and create the graph.
Given the two inputs words w1 and w2
if length(w1) != length(w2)
 Not possible to convert
else
 n1 = get_node(w1)
 n2 = get_node(w2)

 if(path_exists(n1,n2))
   Possible and nodes in the path represent intermediary words
 else
   Not possible

【讨论】：

俄罗斯维基词典实际上正在使用此类图表，请参阅 ru.wiktionary.org/w/… 或 aisee.com/graph_of_the_month/words.htm
你能解释一下我必须生成多少个图表。是一个还是多个？就像在您的示例中一样，“堆栈”和“杰克”之间的关系是什么？谢谢
为什么说如果单词长度不同就不能转换？例如，如果给定的单词可以通过添加一个字符转换为另一个单词，并且它们都可以是有效的单词，那么上述解决方案将不起作用。（例如：w1=the，w2=them）。正确的解决方案是使用编辑距离为 1 的连接节点构建图形。
@prasadvk 最初的问题是“你只能替换一个字符”。插入/删除不同于替换。
关于如何构建图表的任何想法？

【解决方案2】：

codaddict 的图方法是有效的，尽管构建每个图需要 O(n^2) 时间，其中 n 是给定长度的单词数。如果这是个问题，您可以更高效地构建bk-tree，这样就可以找到目标词具有给定编辑距离（在本例中为 1）的所有词。

【讨论】：

好一个尼克。非常感谢分享。当人们对一个古老且已被接受的问题发布一个很好的答案时，我真的很感激。
如果将最大字长和字母大小视为常数，则可以在 O(n) 时间内构建每个图。对于给定的单词（例如“cat”），您可以置换第一个字符（“aat”、“bat”、“cat”、“dat”等）并进行哈希表查找以查看哪些是单词.然后您可以对第二个字母、第三个字母等执行相同的操作。这意味着您可以在 O(n) 预处理后的 O(1) 时间内找到与给定单词编辑距离为 1 的所有单词。因此，在 O(n) 预处理之后，构建大小为 n 的图需要 O(n) 时间。
@JohnKurlak 如果你保持足够多的东西不变，大多数算法看起来都很便宜。
@NickJohnson 这很公平，但实际上这不是什么大问题。在英语中，平均单词长度约为 5 个字母，因此您实际上是在查看每个单词大约 100 个恒定时间操作。如果这对你来说仍然太多，你可以采取另一种方法：有一个Map<String, Map<String, Set<String>>> 将(prefix, suffix) 映射到一组以prefix 开头的单词，之后有任何字母，然后以suffix 结尾。您可以在 O(nm^2) 时间内构建此结构，其中 n 是字典大小，m 是最大字长。平均每个单词大约 25 次操作。

【解决方案3】：

创建一个图表，每个节点代表字典中的单词。在两个词节点之间添加一条边，如果它们对应的词的编辑距离为 1。那么所需的最小转换次数将是源节点和目标节点之间的最短路径长度。

【讨论】：

【解决方案4】：

我不认为这是编辑距离。

我认为这可以使用图表来完成。只需从您的字典中构建一个图，然后尝试使用您最喜欢的图遍历算法导航到目的地。

【讨论】：

【解决方案5】：

您可以简单地使用递归回溯，但这远非最佳解决方案。

# Given two words of equal length that are in a dictionary, write a method to transform one word into another word by changing only
# one letter at a time.  The new word you get in each step must be in the
# dictionary.

# def transform(english_words, start, end):

# transform(english_words, 'damp', 'like')
# ['damp', 'lamp', 'limp', 'lime', 'like']
# ['damp', 'camp', 'came', 'lame', 'lime', 'like']


def is_diff_one(str1, str2):
    if len(str1) != len(str2):
        return False

    count = 0
    for i in range(0, len(str1)):
        if str1[i] != str2[i]:
            count = count + 1

    if count == 1:
        return True

    return False


potential_ans = []


def transform(english_words, start, end, count):
    global potential_ans
    if count == 0:
        count = count + 1
        potential_ans = [start]

    if start == end:
        print potential_ans
        return potential_ans

    for w in english_words:
        if is_diff_one(w, start) and w not in potential_ans:
            potential_ans.append(w)
            transform(english_words, w, end, count)
            potential_ans[:-1]

    return None


english_words = set(['damp', 'camp', 'came', 'lame', 'lime', 'like'])
transform(english_words, 'damp', 'lame', 0)

【讨论】：

【解决方案6】：

@Codeaddict 解决方案是正确的，但它错过了简化和优化解决方案的机会。

DFS 与 BFS：

如果我们使用 DFS，我们有可能在图中更深地遇到 target 字符串（或 to_string）。然后我们必须跟踪找到它的级别和对该节点的引用，最后找到可能的最低级别，然后从根开始跟踪它。

例如，考虑这个转换from -> zoom：

               from
             /       \  
        fram            foom
        /  \            /   \
    dram    drom     [zoom] food       << To traverse upto this level is enough
 ...         |           ...      
            doom                  
             |       
           [zoom]

使用 BFS，我们可以大大简化这个过程。我们需要做的就是：

以from 级别的0 字符串开头。将此字符串添加到visitedSetOfStrings。
将未访问的有效字符串添加到与当前级别字符串编辑距离+1的下一级。
将所有这些字符串添加到visitedSetOfStrings。
如果该集合包含target 字符串，则停止进一步处理节点/字符串。否则继续第 2 步。

为了使路径追踪更容易，我们可以在每个节点中添加parent字符串的额外信息。

【讨论】：

这是精确而清晰的解决方案！应该被接受！

【解决方案7】：

这是使用 BFS 解决问题的 C# 代码：

//use a hash set for a fast check if a word is already in the dictionary
    static HashSet<string> Dictionary = new HashSet<string>();
    //dictionary used to find the parent in every node in the graph and to avoid traversing an already traversed node
    static Dictionary<string, string> parents = new Dictionary<string, string>();

    public static List<string> FindPath(List<string> input, string start, string end)
    {
        char[] allcharacters = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};

        foreach (string s in input)
            Dictionary.Add(s);
        List<string> currentFrontier = new List<string>();
        List<string> nextFrontier = new List<string>();
        currentFrontier.Add(start);
        while (currentFrontier.Count > 0)
        {
            foreach (string s in currentFrontier)
            {
                for (int i = 0; i < s.Length; i++)
                {
                    foreach (char c in allcharacters)
                    {
                        StringBuilder newWordBuilder = new StringBuilder(s);
                        newWordBuilder[i] = c;
                        string newWord = newWordBuilder.ToString();
                        if (Dictionary.Contains(newWord))
                        {
                            //avoid traversing a previously traversed node
                            if (!parents.Keys.Contains(newWord))
                            {
                                parents.Add(newWord.ToString(), s);
                                nextFrontier.Add(newWord);
                            }

                        }
                        if (newWord.ToString() == end)
                        {
                            return ExtractPath(start, end);

                        }
                    }
                }
            }
            currentFrontier.Clear();
            currentFrontier.Concat(nextFrontier);
            nextFrontier.Clear();
        }
        throw new ArgumentException("The given dictionary cannot be used to get a path from start to end");
    }

    private static List<string> ExtractPath(string start,string end)
    {
        List<string> path = new List<string>();
        string current = end;
        path.Add(end);
        while (current != start)
        {
            current = parents[current];
            path.Add(current);
        }
         path.Reverse();
         return path;
    }

【讨论】：

【解决方案8】：

我认为我们不需要图形或其他一些复杂的数据结构。我的想法是将字典加载为HashSet 并使用contains() 方法找出字典中是否存在该单词。

请检查此伪代码以了解我的想法：

Two words are given: START and STOP. 
//List is our "way" from words START to STOP, so, we add the original word to it first.
    list.add(START);
//Finish to change the word when START equals STOP.
    while(!START.equals(STOP))
//Change each letter at START to the letter to STOP one by one and check if such word exists.
    for (int i = 0, i<STOP.length, i++){
        char temp = START[i];
        START[i] = STOP[i];
//If the word exists add a new word to the list of results. 
//And change another letter in the new word with the next pass of the loop.
        if dictionary.contains(START)
           list.add(START)
//If the word doesn't exist, leave it like it was and try to change another letter with the next pass of the loop.
        else START[i] = temp;}
    return list;

据我了解，我的代码应该是这样工作的：

输入：DAMP、LIKE

输出：DAMP、LAMP、LIMP、LIME、LIKE

输入：BACK、PICK

输出：BACK、PACK、PICK

【讨论】：

如果您的字典只包含：DAMP、JAMP、JIMP、JIME、JIKE、LIKE，该怎么办？我的意思是，您可能在字典中有一些中间词，但与源词和目标词的字母不同。
这能保证最短路径吗？

【解决方案9】：

class Solution {
    //static int ans=Integer.MAX_VALUE;
    public int ladderLength(String beginWord, String endWord, List<String> wordList) {
        HashMap<String,Integer> h=new HashMap<String,Integer>();
        HashMap<String,Integer> h1=new HashMap<String,Integer>();
        for(int i=0;i<wordList.size();i++)
        {
            h1.put(wordList.get(i),1);
        }
        int count=0;
        Queue<String> q=new LinkedList<String>();
        q.add(beginWord);
        q.add("-1");
        h.put(beginWord,1);
        int ans=ladderLengthUtil(beginWord,endWord,wordList,h,count,q,h1);
        return ans;
    }
    public int ladderLengthUtil(String beginWord, String endWord, List<String> wordList,HashMap<String,Integer> h,int count,Queue<String> q,HashMap<String,Integer> h1)
    {  
        int ans=1;
        while(!q.isEmpty()) 
        {
            String s=q.peek();
            q.poll();
            if(s.equals(endWord))
            {
                return ans;   
            }
            else if(s.equals("-1"))
            {
                if(q.isEmpty())
                {                    
                    break;
                }
                ans++;                
                q.add("-1");
            }
            else
            {
                for(int i=0;i<s.length();i++)
                {
                        for(int j=0;j<26;j++)
                        {
                            char a=(char)('a'+j);
                            String s1=s.substring(0,i)+a+s.substring(i+1);
                            //System.out.println("s1 is "+s1);
                            if(h1.containsKey(s1)&&!h.containsKey(s1))
                            {
                                h.put(s1,1);
                                q.add(s1);
                            }
                        }
                }
            }
        }
        return 0;    
    }
}

【讨论】：

您好，欢迎来到 Stackoverflow。不幸的是，您的代码格式不正确。请查看stackoverflow.com/editing-help 了解更多信息。

【解决方案10】：

这显然是一个排列问题。使用图表是多余的。问题陈述缺少一个重要的约束； 每个位置只能更改一次。这暗示了解决方案在 4 个步骤之内。现在需要决定的是替换操作的顺序：

操作 1 = 将“H”更改为“T”
Operation2 = 将“E”更改为“A”
Operation3 = 将“A”更改为“I”
Operation4 = 将“D”改为“L”

解决方案，操作序列，是字符串“1234”的某种排列，其中每个数字代表被替换字符的位置。例如“3124”表示首先我们应用操作3，然后操作1，然后操作2，然后操作4。在每一步，如果结果单词不在字典中，则跳到下一个排列。合理的琐碎。给任何人编码？

【讨论】：

我认为他忽略了该约束，因为它不是约束之一。
将复杂度增加到 n^n