【问题标题】:Scalable Trie implementation in PHPPHP 中的可扩展 Trie 实现
【发布时间】:2017-07-25 00:10:42
【问题描述】:

按照this 教程,我遇到了Trie 数据结构。最近我一直在用 PHP 编程,我试图用它来解决讲座的problem。我能够得到正确的答案,但仅限于较小的输入(输入 #10 是一个 2.82 MB 的文件)。显然,我的算法不能很好地扩展。它也超过了 PHP 的默认 128 MB 内存限制。

我的算法

Trie 中存储了一个根节点。每个节点都有一个“子”成员。我使用标准的 PHP 数组来存储孩子。子键代表一个字符(目前我正在为每个字符创建一个新节点,a-z 小写,映射到 0-25),子值是对另一个节点的引用。

因为problem,每个节点都有“权重”成员。 我想优化我的代码,(或者甚至使用不同的方法重写它),以便它可以通过大输入的测试。

如果可能的话,我对使这种数据结构在 PHP 中使用大输入的解决方案感兴趣。

我的代码

TrieNode 类存储树的层次结构。

class TrieNode {
    // weight is needed for the given problem
    public $weight;
    /* TrieNode children, 
    * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
    * where 0 stands for 'a', 1 for 'c'
    * and TrieNode objects are references to other TrieNodes.
    */
    private $children;

    function __construct($weight, $children) {
        $this->weight = $weight;
        $this->children = $children;
    }

    /** map lower case english letters to 0-25 */
    static function getAsciiValue($char) {
        return intval(ord($char)) - intval(ord('a'));
    }

    function addChild($char, $node) {
        if (!isset($this->children)) {
            $this->children = [];
        }
        $this->children[self::getAsciiValue($char)] = $node;
    }

    function isChild($char) {
        return isset($this->children[self::getAsciiValue($char)]);
    }

    function getChild($char) {
        return $this->children[self::getAsciiValue($char)];
    }

    function isLeaf() {
        return empty($this->children);
    }
}

Trie 类存储根 TrieNode。它可以插入和查询节点。

class Trie {
    /* root TrieNode stores the first characters */
    private $root;

    function __construct() {
        $this->root = new TrieNode(-1, []);
    }

    function insert($string, $weight) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = $string[$i];
            if(!$currentNode->isChild($char)) {
                $n = new TrieNode($weight, null);
                $currentNode->addChild($char, $n);
            }
            $currentNode->weight = max($weight, $currentNode->weight);
            $currentNode = $currentNode->getChild($char);
        }
    }

    function getNode($string) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = $string[$i];
            if ($currentNode->isLeaf() || !$currentNode->isChild($char)) {
                return null;
            }
            $currentNode = $currentNode->getChild($char);
        }
        return $currentNode;
    }

    function getWeight($string) {
        $node = $this->getNode($string);
        return is_null($node) ? -1 : $node->weight;
    }
}

测试代码。解析输入并调用 Trie 对象。

//MAIN / TEST

/*
In case the problem page is down:

e.g.
INPUT
2 1
hackerearth 10
hackerrank 9
hacker

OUTPUT
10

where 2 is the number of inserts, 1 is the number of queries
"string number" is the string to insert and its "weight"
"hacker" is the string to query
10 is maximum the weight of the queried string (hacker -> 10)
*/

$trie = new Trie();
$handle = fopen('test.txt', 'r');
//$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
    list($s, $weight) = fscanf($handle, "%s %d");
    $trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
    $query = trim(strval(fgets($handle)));
    echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);

失败

【问题讨论】:

  • @csirmazbendeguz,您遇到了超出时间限制的错误,这意味着您需要优化您的代码。
  • 今天晚些时候我会看看你的代码的时间复杂度
  • 你能把这个问题的链接贴到hackerearth上,这样我就可以尝试提交一个trie解决方案了吗?
  • @zenwraight 感谢您的宝贵时间。链接在说明中。
  • 好消息是对代码进行了一些调整,现在我唯一的一个测试用例超时了,你的逻辑和流程一切都是正确的,只是覆盖了一些极端情况,这样它就不会在那里失败

标签: php algorithm data-structures tree trie


【解决方案1】:

经过一些调整和修改,我已经能够让这个东西适用于所有测试用例,除了一个测试用例超时,

这是除了测试用例 10 之外的所有测试用例都能成功运行的完整代码。

class TrieNode {
        // weight is needed for the given problem
        public $weight;
        /* TrieNode children, 
        * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
        * where 0 stands for 'a', 1 for 'c'
        * and TrieNode objects are references to other TrieNodes.
        */
        private $children;

        function __construct($weight, $children) {
            $this->weight = $weight;
            $this->children = $children;
        }

        /** map lower case english letters to 0-25 */
        static function getAsciiValue($char) {
            return intval(ord($char)) - intval(ord('a'));
        }

        function addChild($char, $node) {
            if (!isset($this->children)) {
                $this->children = [];
            }
            $this->children[self::getAsciiValue($char)] = $node;
        }

        function isChild($char) {
            return isset($this->children[self::getAsciiValue($char)]);
        }

        function getChild($char) {
            return $this->children[self::getAsciiValue($char)];
        }

        function isLeaf() {
            return empty($this->children);
        }
    }

    class Trie {
        /* root TrieNode stores the first characters */
        private $root;

        function __construct() {
            $this->root = new TrieNode(-1, []);
        }

        function insert($string, $weight) {
            $currentNode = $this->root;
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = $string[$i];
                if(!$currentNode->isChild($char)) {
                    $n = new TrieNode($weight, null);
                    $currentNode->addChild($char, $n);
                }
                $currentNode->weight = max($weight, $currentNode->weight);
                $currentNode = $currentNode->getChild($char);
            }
        }

        function getNode($string) {
            $currentNode = $this->root;
            if (empty($currentNode) || !isset($currentNode)) {
              return null;
            }
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = $string[$i];
                if (empty($currentNode) || $currentNode->isLeaf() || !$currentNode->isChild($char)) {
                    return null;
                }
                $currentNode = $currentNode->getChild($char);
                if (empty($currentNode)) {
                  return null;
                }
            }
            return $currentNode;
        }

        function getWeight($string) {
            $node = $this->getNode($string);
            return is_null($node) ? -1 : $node->weight;
        }
    }

    $trie = new Trie();
    //$handle = fopen('test.txt', 'r');
    $handle = STDIN; // <- this is for the online judge
    list($n, $q) = fscanf($handle, "%d %d");
    for ($i = 0; $i < $n; $i++) { // insert data
        list($s, $weight) = fscanf($handle, "%s %d");
        $trie->insert($s, $weight);
    }
    for ($i = 0; $i < $q; $i++) { // query data
        $query = trim(strval(fgets($handle)));
        echo $trie->getWeight($query) . PHP_EOL;
    }
    fclose($handle);

我将尝试添加更多检查,以减少该程序占用的计算周期。

【讨论】:

    【解决方案2】:

    以下是经过以下优化的代码-

    删除了所有不必要的条件检查,例如

    1. 无需检查节点是否为叶子,因为如果节点没有指定字符的子节点,那么它是否为叶子都没有关系。
    2. 每次添加子节点时无需检查 {children} 是否已初始化。删除了这个检查初始化 {children} 到构造函数本身中的空数组。

    移除了 {getAsciiValue} 的函数,而不是使用简单的关联数组作为。同样将 {char} 更改为 ascii 值已从 TrieNode 移动到 Trie 类,因此我们不需要多次转换它

    在这些优化之后,我想出了最小的解决方案,但这也无法通过测试#10。在阅读了 PHP 中的数组之后,我了解到 PHP 不像其他编译语言那样实现数组,相反,PHP 中的任何数组都只是一个有序的哈希映射,并且因为这个数组不支持恒定时间操作。 https://stackoverflow.com/a/4904071/8203131

    也使用SplFixedArray,但没有帮助,因为它是一个对象并且具有实例化成本。如果尝试使用大数组来存储整个 Trie,它可能会有所帮助。您可以尝试实现一个解决方案,使用 SplFixedArray 存储整个 Trie 并检查您是否可以让它通过测试 #10。

    <?php
    
    /*
     * Read input from stdin and provide input before running code
    
    fscanf(STDIN, "%s\n", $name);
    echo "Hi, ".$name;
    
    */
    
    class TrieNode {
        // weight is needed for the given problem
        public $weight;
        /* TrieNode children, 
        * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
        * where 0 stands for 'a', 2 for 'c'
        * and TrieNode objects are references to other TrieNodes.
        */
        private $children;
    
        function __construct($weight) {
            $this->weight = $weight;
            $this->children = [];
        }
    
        function addChild($char, $node) {
            $this->children[$char] = $node;
        }
    
        function isChild($char) {
            return isset($this->children[$char]);
        }
    
        function getChild($char) {
            return $this->children[$char];
        }
    }
    
    class Trie {
        /* root TrieNode stores the first characters */
        private $root;
    
        function __construct() {
            $this->root = new TrieNode(-1);
        }
    
        static $asciiValues = array(
            "a" => 0,
            "b" => 1,
            "c" => 2,
            "d" => 3,
            "e" => 4,
            "f" => 5,
            "g" => 6,
            "h" => 7,
            "i" => 8,
            "j" => 9,
            "k" => 10,
            "l" => 11,
            "m" => 12,
            "n" => 13,
            "o" => 14,
            "p" => 15,
            "q" => 16,
            "r" => 17,
            "s" => 18,
            "t" => 19,
            "u" => 20,
            "v" => 21,
            "w" => 22,
            "x" => 23,
            "y" => 24,
            "z" => 25
        );
    
        function insert($string, $weight) {
            $currentNode = $this->root;
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = self::$asciiValues[$string[$i]];
                $currentNode->weight = max($weight, $currentNode->weight);
                if($currentNode->isChild($char)) {
                    $childNode = $currentNode->getChild($char);
                } else {
                    $childNode = new TrieNode($weight);
                    $currentNode->addChild($char, $childNode);
                }
                $currentNode = $childNode;
            }
        }
    
        function getNodeWeight($string) {
            $currentNode = $this->root;
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = self::$asciiValues[$string[$i]];
                if (!$currentNode->isChild($char)) {
                    return -1;
                }
                $currentNode = $currentNode->getChild($char);
            }
            return $currentNode->weight;
        }
    }
    
    $trie = new Trie();
    //$handle = fopen('test.txt', 'r');
    $handle = STDIN; // <- this is for the online judge
    list($n, $q) = fscanf($handle, "%d %d");
    for ($i = 0; $i < $n; $i++) { // insert data
        list($s, $weight) = fscanf($handle, "%s %d");
        $trie->insert($s, $weight);
    }
    for ($i = 0; $i < $q; $i++) { // query data
        //$query = trim(strval(fgets($handle)));
        $query = trim(strval(fgets($handle)));
        echo $trie->getNodeWeight($query) . PHP_EOL;
    }
    fclose($handle);
    
    
    ?>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-04-27
      • 1970-01-01
      相关资源
      最近更新 更多