如何有效地实现三元搜索尝试的 floor() 和 ceil() 操作？答案

【问题标题】：How to implement floor() and ceil() operations for ternary search tries efficiently?如何有效地实现三元搜索尝试的 floor() 和 ceil() 操作？
【发布时间】：2015-11-23 08:54:51
【问题描述】：

我一直在研究 Robert Sedgewick 书中的 TST（三元搜索尝试），这是他的实现的链接：http://algs4.cs.princeton.edu/52trie/TST.java.html

所以，由于 TST 是 BST 的修改版，我想知道如何有效地实施地板和天花板操作。（它们没有在他的代码中的任何地方实现）。我想到的所有方法都很杂乱，效率也不高。

【问题讨论】：

@DougCurrie 实际上我首先写道，我会找到输入字符串的最长前缀，然后找到具有该前缀的所有键，以及二进制搜索地板和天花板，但后来我意识到这是错误的。我想到的其他方法涉及存储一堆字符串，并且输入字符串的字符数不是线性的。

标签： algorithm data-structures tree time-complexity binary-search-tree

【解决方案1】：

是的，您可以在 TST 上高效地实施这些操作。

将 TST 视为简单的尝试可能会有所帮助。我们可以计算出如何在 trie 中执行前任和后继搜索（您称之为地板和天花板），然后调整这些方法以在 TST 中工作。为简单起见，我将只讨论后继搜索，尽管这也可以很容易地适用于先行搜索。

假设您想找到不晚于某个单词 w 的字典序上的第一个单词。首先在 trie 中搜索单词 w。如果你发现 w 是 trie 中的一个词，那么你就完成了。

否则，可能会发生一些事情。首先，您可能会发现您最终到达了与 w 相对应的某个不是单词的节点。在这种情况下，您知道 w 是 trie 中某个单词的前缀，因此要找到后继词，您需要找到按字典顺序排列的第一个以 w 作为前缀的字符串。为此，请继续沿着树向下走，始终尽可能向左走，直到最终找到与单词对应的节点。

其次，在尝试搜索 w 时，您可能会掉线。在这种情况下，您将在您的路径中阅读 w 的一些前缀。在这种情况下，您一定已经在某个节点处结束，您试图读取字符 c，但没有标记为 c 的边。在这种情况下，查看该节点的其他边，并找到第一个字符在 c 之后的边。如果存在，就取它，然后通过总是尽可能地向左移动来找到该 subtrie 中按字典顺序排列的第一个单词。如果没有，请备份 trie 中的一个节点并重复此过程。

总而言之，递归算法如下所示：

function findSuccessor(root, remainingChars) {
    /* If we walked off the trie, we need to back up. Return null
     * to signal an error.
     */
    if (root == null) return null;

    /* If we're on the trie and out of characters, we're either done
     * or we need to find the cheapest way to extend this path.
     */
    if (remainingChars == "") {
        if (root is a word) {
            return root;
        } else {
            return goLeftUntilYouFindAWord(root);
        }
    }

    /* Otherwise, keep walking down the trie. */
    let nextLetter = remainingChars[0];

    /* If there is a child for this letter, follow it and see
     * what happens.
     */
    if (root.hasChildFor(nextLetter)) {
        let result = findSuccessor(root.child(nextLetter), nextLetter.substring(1));

        /* If we found something, great! We're done. */
        if (result != null) return result;
    }

    /* If we're here, we either (a) have no transition on this
     * character or (b) we do, but the successor isn't there. In
     * either case, figure out which child we have that comes right
     * after nextLetter and go down there if possible.
     */
    char letterAfter = node.firstChildAfter(nextLetter);

    /* If no such child exists, there is no successor in this
     * subtrie. Report failure.
     */
    if (letterAfter == null) return null;

    /* Otherwise, get the first word in that subtrie. */
    return goLeftUntilYouFindAWord(node.child(letterAfter));
}

那么这究竟是如何转化为 TST 案例的呢？好吧，我们需要能够检查一个孩子是否存在——这是我们可以通过常规 BST 查找来做的事情——我们还需要能够找到在特定级别的角色之后出现的第一个角色——我们可以在 BST 中进行后继搜索。我们还需要能够找到子树中的第一个词，我们可以通过在子指针的 BST 中始终向左走来做到这一点。

总的来说，这里的运行时间为 O(L log |Σ|)，其中 L 是 trie 中最长字符串的长度，而 Σ 是允许的字符集。这样做的原因是，在最坏的情况下，我们必须一直下降到 TST 以找到后继者，并且每次这样做时，我们都会执行恒定数量的 BST 操作，每个操作都需要时间 O(log |Σ |) 因为最多有 |Σ|每个节点的子指针。

如果您想查看具体的实现，我有一个C++ implementation of a TST，它实现了lower_bound 和upper_bound，它们与您描述的操作密切相关。

【讨论】：