【问题标题】:How can I get Serde to allocate strings from an arena during deserialization?如何让 Serde 在反序列化期间从竞技场分配字符串?
【发布时间】:2018-08-23 14:53:04
【问题描述】:

我有一个带有字符串字段的结构。我想控制如何分配字符串的内存。特别是,我想使用copy_arena 之类的方式分配它们。

也许我可以自定义 ArenaString 类型,但我不知道如何在反序列化代码中获取对 Arena 的引用,假设这是可能的,那么我将不得不处理竞技场一辈子,对吧?

【问题讨论】:

    标签: rust deserialization serde


    【解决方案1】:

    这是一种可能的实现,它使用serde::de::DeserializeSeed 将 arena 分配器公开给反序列化代码。

    在更复杂的用例中,您可能需要编写一个过程宏来生成此类 impl。


    #[macro_use]
    extern crate serde_derive;
    
    extern crate copy_arena;
    extern crate serde;
    extern crate serde_json;
    
    use std::fmt;
    use std::marker::PhantomData;
    use std::str;
    
    use serde::de::{self, DeserializeSeed, Deserializer, MapAccess, Visitor};
    
    use copy_arena::{Allocator, Arena};
    
    #[derive(Debug)]
    struct Jason<'a> {
        one: &'a str,
        two: &'a str,
    }
    
    struct ArenaSeed<'a, T> {
        allocator: Allocator<'a>,
        marker: PhantomData<fn() -> T>,
    }
    
    impl<'a, T> ArenaSeed<'a, T> {
        fn new(arena: &'a mut Arena) -> Self {
            ArenaSeed {
                allocator: arena.allocator(),
                marker: PhantomData,
            }
        }
    
        fn alloc_string(&mut self, owned: String) -> &'a str {
            let slice = self.allocator.alloc_slice(owned.as_bytes());
            // We know the bytes are valid UTF-8.
            str::from_utf8(slice).unwrap()
        }
    }
    
    impl<'de, 'a> DeserializeSeed<'de> for ArenaSeed<'a, Jason<'a>> {
        type Value = Jason<'a>;
    
        fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
        where
            D: Deserializer<'de>,
        {
            static FIELDS: &[&str] = &["one", "two"];
            deserializer.deserialize_struct("Jason", FIELDS, self)
        }
    }
    
    impl<'de, 'a> Visitor<'de> for ArenaSeed<'a, Jason<'a>> {
        type Value = Jason<'a>;
    
        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            formatter.write_str("struct Jason")
        }
    
        fn visit_map<A>(mut self, mut map: A) -> Result<Self::Value, A::Error>
        where
            A: MapAccess<'de>,
        {
            #[derive(Deserialize)]
            #[serde(field_identifier, rename_all = "lowercase")]
            enum Field { One, Two }
    
            let mut one = None;
            let mut two = None;
            while let Some(key) = map.next_key()? {
                match key {
                    Field::One => {
                        if one.is_some() {
                            return Err(de::Error::duplicate_field("one"));
                        }
                        one = Some(self.alloc_string(map.next_value()?));
                    }
                    Field::Two => {
                        if two.is_some() {
                            return Err(de::Error::duplicate_field("two"));
                        }
                        two = Some(self.alloc_string(map.next_value()?));
                    }
                }
            }
            let one = one.ok_or_else(|| de::Error::missing_field("one"))?;
            let two = two.ok_or_else(|| de::Error::missing_field("two"))?;
            Ok(Jason { one, two })
        }
    }
    
    fn main() {
        let j = r#" {"one": "I", "two": "II"} "#;
    
        let mut arena = Arena::new();
        let seed = ArenaSeed::new(&mut arena);
        let mut de = serde_json::Deserializer::from_str(j);
        let jason: Jason = seed.deserialize(&mut de).unwrap();
        println!("{:?}", jason);
    }
    

    如果 arena 分配不是一个严格的要求,并且您只需要在许多反序列化对象之间分摊字符串分配的成本,Deserialize::deserialize_in_place 是一个更简洁的选择。

    // [dependencies]
    // serde = "1.0"
    // serde_derive = { version = "1.0", features = ["deserialize_in_place"] }
    // serde_json = "1.0"
    
    #[macro_use]
    extern crate serde_derive;
    
    extern crate serde;
    extern crate serde_json;
    
    use serde::Deserialize;
    
    #[derive(Deserialize, Debug)]
    struct Jason {
        one: String,
        two: String,
    }
    
    fn main() {
        let j = r#" {"one": "I", "two": "II"} "#;
    
        // Allocate some Strings during deserialization.
        let mut de = serde_json::Deserializer::from_str(j);
        let mut jason = Jason::deserialize(&mut de).unwrap();
        println!("{:?} {:p} {:p}", jason, jason.one.as_str(), jason.two.as_str());
    
        // Reuse the same String allocations for some new data.
        // As long as the strings in the new datum are at most as long as the
        // previous datum, the strings do not need to be reallocated and will
        // remain at the same memory address.
        let mut de = serde_json::Deserializer::from_str(j);
        Jason::deserialize_in_place(&mut de, &mut jason).unwrap();
        println!("{:?} {:p} {:p}", jason, jason.one.as_str(), jason.two.as_str());
    
        // Do not reuse the string allocations.
        // The strings here will not be at the same address as above.
        let mut de = serde_json::Deserializer::from_str(j);
        let jason = Jason::deserialize(&mut de).unwrap();
        println!("{:?} {:p} {:p}", jason, jason.one.as_str(), jason.two.as_str());
    }
    

    【讨论】:

    • Serde issue 1325 似乎表明这不会以递归方式工作——我误解了吗?
    • 你需要一个 DeserializeSeed impl 沿着嵌套结构的层次结构的每一层来传播 Allocator 到任何需要它的地方(这是一个过程宏可能有用的地方),或者找到一些方法将Allocator 粘贴在thread_local 中,以使其可用于反序列化实现。
    • Allocator 粘贴在 thread_local 中 — 我一直在思考这些问题,但感觉返回的字符串的生命周期不会自然而然地起作用,你会需要求助于不安全。不过我没试过。
    猜你喜欢
    • 2018-03-27
    • 1970-01-01
    • 2021-08-25
    • 2020-03-16
    • 1970-01-01
    • 2020-10-22
    • 1970-01-01
    相关资源
    最近更新 更多