如何使用矩阵中的相邻字母找到所有可能的单词答案

【问题标题】：How to find all the possible words using adjacent letters in a matrix如何使用矩阵中的相邻字母找到所有可能的单词
【发布时间】：2016-12-29 00:21:43
【问题描述】：

我有以下测试矩阵：

我克吨米 Ĵ

我打算创建一种算法，帮助我仅使用相邻字母来帮助我找到从给定最小长度到最大长度的每个可能的单词。

例如：

最少：3 个字母

最多：6 个字母

基于测试矩阵，我应该有以下结果：

阿里
救命
算法
替代
提
自动取款机
atg
...
atmea

等等

我创建了一个测试代码 (C#)，它有一个代表字母的自定义类。

每个字母都知道它的邻居，并且有一个生成计数器（用于在遍历期间跟踪它们）。

这是它的代码：

public class Letter
{
    public int X { get; set; }
    public int Y { get; set; }

    public char Character { get; set; }

    public List<Letter> Neighbors { get; set; }

    public Letter PreviousLetter { get; set; }

    public int Generation { get; set; }

    public Letter(char character)
    {
        Neighbors = new List<Letter>();
        Character = character;
    }

    public void SetGeneration(int generation)
    {
        foreach (var item in Neighbors)
        {
            item.Generation = generation;
        }
    }
}

我发现如果我希望它是动态的，它必须基于递归。

不幸的是，以下代码创建了前 4 个单词，然后停止。这并不奇怪，因为递归被指定的生成级别停止。

主要问题是递归只返回一级，但最好还是回到起点。

 private static void GenerateWords(Letter input, int maxLength, StringBuilder sb)
    {
        if (input.Generation >= maxLength)
        {               
            if (sb.Length == maxLength)
            {
                allWords.Add(sb.ToString());
                sb.Remove(sb.Length - 1, 1);
            }                
            return;
        }
        sb.Append(input.Character);
        if (input.Neighbors.Count > 0)
        {
            foreach (var child in input.Neighbors)
            {
                if (input.PreviousLetter == child)
                    continue;
                child.PreviousLetter = input;
                child.Generation = input.Generation + 1;
                GenerateWords(child, maxLength, sb);
            }
        }
    }

所以，我觉得有点卡住了，知道应该怎么做吗？

【问题讨论】：

标签： c# recursion matrix

【解决方案1】：

从这里，您可以将其视为图遍历问题。您从每个给定的字母开始，找到长度为 min_size 到 max_size 的每条路径，其中 3 和 6 作为示例中的值。我建议使用递归例程将单词构建为通过网格的路径。这将如下所示；用您的偏好替换类型和伪代码。

<array_of_string> build_word(size, current_node) {
    if (size == 1)  return current_node.letter as an array_of_string;
    result = <empty array_of_string>
    for each next_node in current_node.neighbours {
        solution_list = build_word(size-1, next_node);
        for each word in solution_list {
             // add current_node.letter to front of that word.
             // add this new word to the result array
        }
    }
    return the result array_of_string
}

这会让您找到解决方案吗？

【讨论】：

嗯，它不仅仅是向前推进 :) 实际上，你提出了一个可行的解决方案......我添加了一些检查（不要使用相同的字母，不要检查邻居父级等），但它现在有效。我想，我应该回学校再学习一些算法理论...谢谢你指导我:)

【解决方案2】：

在解决这类问题时，我倾向于使用不可变类，因为一切都更容易推理。以下实现使用了 ad hoc ImmutableStack，因为它的实现非常简单。在生产代码中，我可能想研究 System.Collections.Immutable 以提高性能（visited 将是 ImmutableHashSet<> 以指出明显的情况）。

那么为什么我需要一个不可变的堆栈呢？跟踪当前字符路径和矩阵内访问过的“位置”。因为为作业选择的工具是不可变的，所以将其发送到递归调用是很容易的，我们知道它不能改变，所以我不必担心每个递归级别的不变量。

让我们实现一个不可变堆栈。

我们还将实现一个帮助类Coordinates，它将我们的“位置”封装在矩阵中，将为我们提供值相等语义和一种获取任何给定位置的有效邻居的便捷方法。它显然会派上用场。

public class ImmutableStack<T>: IEnumerable<T>
{
    private readonly T head;
    private readonly ImmutableStack<T> tail;

    public static readonly ImmutableStack<T> Empty = new ImmutableStack<T>(default(T), null);
    public int Count => this == Empty ? 0 : tail.Count + 1;

    private ImmutableStack(T head, ImmutableStack<T> tail)
    {
        this.head = head;
        this.tail = tail;
    }

    public T Peek()
    {
        if (this == Empty)
            throw new InvalidOperationException("Can not peek an empty stack.");

        return head;
    }

    public ImmutableStack<T> Pop()
    {
        if (this == Empty)
            throw new InvalidOperationException("Can not pop an empty stack.");

        return tail;
    }

    public ImmutableStack<T> Push(T value) => new ImmutableStack<T>(value, this);

    public IEnumerator<T> GetEnumerator()
    {
        var current = this;

        while (current != Empty)
        {
            yield return current.head;
            current = current.tail;
        }
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

struct Coordinates: IEquatable<Coordinates>
{
    public int Row { get; }
    public int Column { get; }

    public Coordinates(int row, int column)
    {
        Row = row;
        Column = column;
    }

    public bool Equals(Coordinates other) => Column == other.Column && Row == other.Row;
    public override bool Equals(object obj)
    {
        if (obj is Coordinates)
        {
            return Equals((Coordinates)obj);
        }

        return false;
    }

    public override int GetHashCode() => unchecked(27947 ^ Row ^ Column);

    public IEnumerable<Coordinates> GetNeighbors(int rows, int columns)
    {
        var increasedRow = Row + 1;
        var decreasedRow = Row - 1;
        var increasedColumn = Column + 1;
        var decreasedColumn = Column - 1;
        var canIncreaseRow = increasedRow < rows;
        var canIncreaseColumn = increasedColumn < columns;
        var canDecreaseRow = decreasedRow > -1;
        var canDecreaseColumn = decreasedColumn > -1;

        if (canDecreaseRow)
        {
            if (canDecreaseColumn)
            {
                yield return new Coordinates(decreasedRow, decreasedColumn);
            }

            yield return new Coordinates(decreasedRow, Column);

            if (canIncreaseColumn)
            {
                yield return new Coordinates(decreasedRow, increasedColumn);
            }
        }

        if (canIncreaseRow)
        {
            if (canDecreaseColumn)
            {
                yield return new Coordinates(increasedRow, decreasedColumn);
            }

            yield return new Coordinates(increasedRow, Column);

            if (canIncreaseColumn)
            {
                yield return new Coordinates(increasedRow, increasedColumn);
            }
        }

        if (canDecreaseColumn)
        {
            yield return new Coordinates(Row, decreasedColumn);
        }

        if (canIncreaseColumn)
        {
            yield return new Coordinates(Row, increasedColumn);
        }
    }
}

好的，现在我们需要一个遍历矩阵的方法，一旦返回具有指定最小字符数且不超过指定最大值的单词，就会访问每个位置。

public static IEnumerable<string> GetWords(char[,] matrix,
                                           Coordinates startingPoint,
                                           int minimumLength,
                                           int maximumLength)

看起来差不多。现在，在递归时，我们需要跟踪我们访问过的字符，这很容易使用我们的不可变堆栈，因此我们的递归方法如下所示：

static IEnumerable<string> getWords(char[,] matrix,
                                    ImmutableStack<char> path,
                                    ImmutableStack<Coordinates> visited,
                                    Coordinates coordinates,
                                    int minimumLength,
                                    int maximumLength)

现在剩下的只是管道和连接电线：

public static IEnumerable<string> GetWords(char[,] matrix,
                                           Coordinates startingPoint,
                                           int minimumLength,
                                           int maximumLength)
    => getWords(matrix,
                ImmutableStack<char>.Empty,
                ImmutableStack<Coordinates>.Empty,
                startingPoint,
                minimumLength,
                maximumLength);


static IEnumerable<string> getWords(char[,] matrix,
                                    ImmutableStack<char> path,
                                    ImmutableStack<Coordinates> visited,
                                    Coordinates coordinates,
                                    int minimumLength,
                                    int maximumLength)
{
    var newPath = path.Push(matrix[coordinates.Row, coordinates.Column]);
    var newVisited = visited.Push(coordinates);

    if (newPath.Count > maximumLength)
    {
        yield break;
    }
    else if (newPath.Count >= minimumLength)
    {
        yield return new string(newPath.Reverse().ToArray());
    }

    foreach (Coordinates neighbor in coordinates.GetNeighbors(matrix.GetLength(0), matrix.GetLength(1)))
    {
        if (!visited.Contains(neighbor))
        {
            foreach (var word in getWords(matrix,
                                          newPath,
                                          newVisited,
                                          neighbor,
                                          minimumLength,
                                          maximumLength))
            {
                yield return word;
            }
        }
    }
}

我们完成了。这是最优雅或最快的算法吗？可能不是，但我发现它是最容易理解的，因此也是可以维护的。希望对你有所帮助。

更新基于下面的 cmets，我运行了一些测试用例，其中之一是：

var matrix = new[,] { {'a', 'l'},
                      {'g', 't'} };
var words = GetWords(matrix, new Coordinates(0,0), 2, 4);
Console.WriteLine(string.Join(Environment.NewLine, words.Select((w,i) => $"{i:00}: {w}")));

结果是预期的：

00: ag
01: agl
02: aglt
03: agt
04: agtl
05: at
06: atl
07: atlg
08: atg
09: atgl
10: al
11: alg
12: algt
13: alt
14: altg

【讨论】：

谢谢，这是一个很好的解决方案。我用相同的参数对其进行了测试，您的解决方案创建了大量可能的单词（Prune 的算法为 4x4 矩阵中的 min(3) 和 max(12) 创建 600k 个单词 /i>，你的创造了超过 550 万字）。因此，您的算法需要 cca。运行 30 秒，Prune 需要 2-3。我还没查，比较接近现实:)
@nestor 重复的单词可能是一个原因，因为没有检查是否已经产生了任何给定的单词。用更小的解决方案集运行算法，看看差异在哪里。老实说，我还没有测试我的解决方案，所以它可能是错误的。
@Nestor：“Prune 的算法”几乎不是正确的名称。这是递归编程中众所周知的解决方案，即多次递归并产生结果。基础是 DIjkstra 的图遍历算法；我只是根据长度的要求调整它，而不是给定的最终状态。您可以在本网站上许多标记为“递归”的答案中找到应计逻辑，尤其是“目标总和”和“进行更改”问题。
@Prune 感谢您的解释（我什至不知道“应计”一词，尽管我的母语不是英语:)。我真的很喜欢它，我觉得它很简单，但需要帮助。
@nestor 好的，有时间运行一些小测试用例，我的实现似乎运行良好。不知道为什么你会得到如此不同的结果，但我没有基于 Prune 解决方案的实现，所以它很难诊断。