【问题标题】:Rust - Collecting slices of a Vec in a recursive functionRust - 在递归函数中收集 Vec 的切片
【发布时间】:2021-02-20 08:31:02
【问题描述】:

我目前正在尝试构建一个霍夫曼编码程序,并且在遍历生成的霍夫曼树以创建查找表时遇到了一个问题。我决定用递归函数来实现上述遍历。在实际实现中,我使用bitvec crate 来保存位序列,但为了简单起见,我将在本文中使用Vec<bool>

我的想法是将所有代码字的集合保存在 Vec codewords 中,然后只从该向量中保存一个切片用于实际查找表,为此我使用了 HashMap

问题是我将如何解决为左右遍历添加 0 或 1 的问题。我的想法是保存当前序列切片的克隆,将 0 附加到 codewords,然后在向左遍历后将该克隆附加到 codewords 的末尾,以便我可以推入 1 并遍历到正确的。我想出的函数是这样的:

use std::collections::HashMap;

// ignore everything being public, I use getters in the real code
pub struct HufTreeNode {
    pub val: u8,
    pub freq: usize,
    pub left: i16,
    pub right: i16,
}

fn traverse_tree<'a>(
    cur_index: usize,
    height: i16,
    codewords: &'a mut Vec<bool>,
    lookup_table: &mut HashMap<u8, &'a [bool]>,
    huffman_tree: &[HufTreeNode],
) {
    let cur_node = &huffman_tree[cur_index];

    // if the left child is -1, we reached a leaf
    if cur_node.left == -1 {
        // the last `height` bits in codewords
        let cur_sequence = &codewords[(codewords.len() - 1 - height as usize)..];
        lookup_table.insert(cur_node.val, cur_sequence);
        return;
    }

    // save the current sequence so we can traverse to the right afterwards
    let mut cur_sequence = codewords[(codewords.len() - 1 - height as usize)..].to_vec();
    codewords.push(false);
    traverse_tree(
        cur_node.left as usize,
        height + 1,
        codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
        lookup_table,
        huffman_tree,
    );

    // append the previously saved current sequence
    codewords.append(&mut cur_sequence); // second mutable borrow occurs here
    codewords.push(true); // third mutable borrow occurs here
    traverse_tree(
        cur_node.right as usize,
        height + 1,
        codewords, // fourth mutable borrow occurs here
        lookup_table,
        huffman_tree,
    );
}

fn main() {
    // ...
}

显然在代码的 sn-p 中存在生命周期和借用问题,我有点明白问题所在。据我了解,当我在递归调用中将codewords 作为参数时,只要我将切片保存在lookup_table 中,它就必须借用向量,这显然是不可能的,从而导致错误。我该如何解决?

这是cargo check 给我的:

error[E0499]: cannot borrow `*codewords` as mutable more than once at a time
  --> untitled.rs:43:5
   |
14 |   fn traverse_tree<'a>(
   |                    -- lifetime `'a` defined here
...
34 | /     traverse_tree(
35 | |         cur_node.left as usize,
36 | |         height + 1,
37 | |         codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
   | |         --------- first mutable borrow occurs here
38 | |         lookup_table,
39 | |         huffman_tree,
40 | |     );
   | |_____- argument requires that `*codewords` is borrowed for `'a`
...
43 |       codewords.append(&mut cur_sequence); // second mutable borrow occurs here
   |       ^^^^^^^^^ second mutable borrow occurs here

error[E0499]: cannot borrow `*codewords` as mutable more than once at a time
  --> untitled.rs:44:5
   |
14 |   fn traverse_tree<'a>(
   |                    -- lifetime `'a` defined here
...
34 | /     traverse_tree(
35 | |         cur_node.left as usize,
36 | |         height + 1,
37 | |         codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
   | |         --------- first mutable borrow occurs here
38 | |         lookup_table,
39 | |         huffman_tree,
40 | |     );
   | |_____- argument requires that `*codewords` is borrowed for `'a`
...
44 |       codewords.push(true); // third mutable borrow occurs here
   |       ^^^^^^^^^ second mutable borrow occurs here

error[E0499]: cannot borrow `*codewords` as mutable more than once at a time
  --> untitled.rs:48:9
   |
14 |   fn traverse_tree<'a>(
   |                    -- lifetime `'a` defined here
...
34 | /     traverse_tree(
35 | |         cur_node.left as usize,
36 | |         height + 1,
37 | |         codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
   | |         --------- first mutable borrow occurs here
38 | |         lookup_table,
39 | |         huffman_tree,
40 | |     );
   | |_____- argument requires that `*codewords` is borrowed for `'a`
...
48 |           codewords, // fourth mutable borrow occurs here
   |           ^^^^^^^^^ second mutable borrow occurs here

我在这里缺少什么?矢量 API 中是否有一些我缺少的神奇功能,为什么这首先会产生生命周期问题?据我所知,我所有的生命都是正确的,因为codewords 的寿命总是足够长,足以让lookup_table 保存所有这些切片,而且我从不可变地同时借用两次东西。如果我的生命周期有问题,编译器会在if cur_node.left == -1 块内抱怨,而cur_sequence 我在它是拥有的Vec 之后会抱怨,所以不会有任何借用问题。所以真正的问题在于核心思想是拥有一个以可变引用作为参数的递归函数。

有什么办法可以解决这个问题吗?我尝试让codewords 拥有并返回它,但是编译器无法确保我保存在lookup_table 中的位序列能够存活足够长的时间。我唯一的想法是将拥有的向量保存在lookup_table 中,但此时codewords 向量首先已过时,我可以通过将cur_sequence 向量作为我克隆的参数来简单地实现这一点每次调用,但我选择我的方法是为了在之后的实际编码过程中获得更好的缓存性能,然后我会丢失。

【问题讨论】:

    标签: recursion rust lifetime huffman-code ownership


    【解决方案1】:

    问题在于,当您像在let cur_sequence = &amp;codewords[(codewords.len() - 1 - height as usize)..]; 中那样从codewords 创建切片cur_sequence 时,编译器会将codewords 引用的生命周期延长到至少与cur_sequence 相同(为什么:编译器希望确保切片cur_sequence 始终有效,但如果您更改codewords(比如清除它),那么cur_sequence 可能无效。通过保持对codewords 的不可变引用,然后当切片仍然存在时,借用规则将禁止修改codewords)。不幸的是,您将cur_sequence 保存在lookup_table 中,从而在整个函数中保持对codewords 的引用,因此您不能再可变地借用codewords

    解决办法是自己维护切片的索引:创建一个结构体:

    struct Range {
        start: usize,
        end: usize
    }
    
    impl Range {
        fn new(start: usize, end: usize) -> Self {
            Range{ start, end}
        }
    }
    
    

    然后用它代替切片:

    let cur_range = Range::new(
        codewords.len() - 1 - height as usize,
        codewords.len() - 1
    );
    lookup_table.insert(cur_node.val, cur_range);
    

    这样,保持范围有效的责任就是你了。

    完整代码:

    use std::collections::HashMap;
    
    // ignore everything being public, I use getters in the real code
    pub struct HufTreeNode {
        pub val: u8,
        pub freq: usize,
        pub left: i16,
        pub right: i16,
    }
    
    struct Range {
        start: usize,
        end: usize
    }
    
    impl Range {
        fn new(start: usize, end: usize) -> Self {
            Range{ start, end}
        }
    }
    
    fn traverse_tree(
        cur_index: usize,
        height: i16,
        codewords: &mut Vec<bool>,
        lookup_table: &mut HashMap<u8, Range>,
        huffman_tree: &[HufTreeNode],
    ) {
        let cur_node = &huffman_tree[cur_index];
    
        // if the left child is -1, we reached a leaf
        if cur_node.left == -1 {
            // the last `height` bits in codewords
            // let cur_sequence = &codewords[(codewords.len() - 1 - height as usize)..];
            let cur_range = Range::new(
                codewords.len() - 1 - height as usize,
                codewords.len() - 1
            );
            lookup_table.insert(cur_node.val, cur_range);
            return;
        }
    
        // save the current sequence so we can traverse to the right afterwards
        let mut cur_sequence = codewords[(codewords.len() - 1 - height as usize)..].to_vec();
        codewords.push(false);
        traverse_tree(
            cur_node.left as usize,
            height + 1,
            codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
            lookup_table,
            huffman_tree,
        );
    
        // append the previously saved current sequence
        codewords.append(&mut cur_sequence); // second mutable borrow occurs here
        codewords.push(true); // third mutable borrow occurs here
        traverse_tree(
            cur_node.right as usize,
            height + 1,
            codewords, // fourth mutable borrow occurs here
            lookup_table,
            huffman_tree,
        );
    }
    
    fn main() {
        // ...
    }
    

    【讨论】:

    • 我编辑了原始帖子,所以它现在(希望)是自包含的。但遗憾的是,这不起作用。我放入lookup_table 的切片都来自codewords。如果我不手动添加生命周期,编译器将为每个引用提供其自己的生命周期,然后当我尝试使用 lookup_table.insert(...) 插入切片时会给出错误,因为我实际拥有的生命周期与函数签名不匹配。
    猜你喜欢
    • 2017-05-01
    • 1970-01-01
    • 2014-05-30
    • 2013-12-07
    • 2022-11-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多