【问题标题】:Safe dumping and loading of defaultdict with ruamel.yaml使用 ruamel.yaml 安全转储和加载 defaultdict
【发布时间】:2019-03-27 21:38:29
【问题描述】:

我正在尝试在 Python 中(反)序列化具有 collections.defaultdict 属性和 ruamel.yaml 的类(在我的情况下为 3.6+)。

这将是我想要开始工作的一个最小示例:

from collections import defaultdict
import ruamel.yaml
from pathlib import Path

class Foo:
    def __init__(self):
        self.x = defaultdict()


YAML = ruamel.yaml.YAML(typ="safe")
YAML.register_class(Foo)
YAML.register_class(defaultdict)

fp =  Path("./test.yaml")
YAML.dump(Foo(), fp)
YAML.load(fp)

但这失败了:

AttributeError: 'collections.defaultdict' object has no attribute '__dict__'

任何不需要为每个“Foo-like”类编写自定义代码的想法?我希望我可以为 defaultdict 对象添加不同的表示器,但到目前为止我的尝试都是徒劳的。

完整的追溯:

Traceback (most recent call last):
File "./tests/test_yaml.py", line 18, in <module>
    YAML.dump(Foo(), fp)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 439, in dump
    return self.dump_all([data], stream, _kw, transform=transform)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 453, in dump_all
    self._context_manager.dump(data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 801, in dump
    self._yaml.representer.represent(data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 81, in represent
    node = self.represent_data(data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 108, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 638, in t_y
    tag, data, cls, flow_style=representer.default_flow_style
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 384, in represent_yaml_object
    return self.represent_mapping(tag, state, flow_style=flow_style)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 218, in represent_mapping
    node_value = self.represent_data(item_value)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 108, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 638, in t_y
    tag, data, cls, flow_style=representer.default_flow_style
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 383, in represent_yaml_object
    state = data.__dict__.copy()
AttributeError: 'collections.defaultdict' object has no attribute '__dict__'

【问题讨论】:

  • 我更新了我的答案以处理非无default_factory 参数(特别是如何处理defaultdict(list)

标签: python yaml defaultdict ruamel.yaml


【解决方案1】:

这是因为defaultdict 是内置类dict 的子类,它没有__dict__ 属性供YAML 编码器生成类属性名称。在这种情况下defaultdict应该被视为dict,但问题是ruamel.yaml.representer.BaseRepresenter类的represent_data方法只查看对象本身的类来确定是否存在对象的表示者:

data_types = type(data).__mro__
# ...skipped
if data_types[0] in self.yaml_representers:
    node = self.yaml_representers[data_types[0]](self, data)

应该做的是检查__mro__中的任何数据类型是否有代表,如果找到就使用它:

if any(data_type in self.yaml_representers for data_type in data_types):
    node = self.yaml_representers[next(data_type for data_type in data_types if data_type in self.yaml_representers)](self, data)

所以我们可以自己猴子修补这个方法:

def represent_data(self, data):
    # type: (Any) -> Any
    if self.ignore_aliases(data):
        self.alias_key = None
    else:
        self.alias_key = id(data)
    if self.alias_key is not None:
        if self.alias_key in self.represented_objects:
            node = self.represented_objects[self.alias_key]
            # if node is None:
            #     raise RepresenterError(
            #          "recursive objects are not allowed: %r" % data)
            return node
        # self.represented_objects[alias_key] = None
        self.object_keeper.append(data)
    data_types = type(data).__mro__
    if representer.PY2:
        # if type(data) is types.InstanceType:
        if isinstance(data, representer.types.InstanceType):
            data_types = representer.get_classobj_bases(data.__class__) + list(data_types)
    if any(data_type in self.yaml_representers for data_type in data_types):
        node = self.yaml_representers[next(data_type for data_type in data_types if data_type in self.yaml_representers)](self, data)
    else:
        for data_type in data_types:
            if data_type in self.yaml_multi_representers:
                node = self.yaml_multi_representers[data_type](self, data)
                break
        else:
            if None in self.yaml_multi_representers:
                node = self.yaml_multi_representers[None](self, data)
            elif None in self.yaml_representers:
                node = self.yaml_representers[None](self, data)
            else:
                node = representer.ScalarNode(None, representer.text_type(data))
    # if alias_key is not None:
    #     self.represented_objects[alias_key] = node
    return node
representer.BaseRepresenter.represent_data = represent_data

这样您的代码就可以在不注册 defaultdict 的情况下运行:

class Foo:
    def __init__(self):
        self.x = defaultdict()

YAML = ruamel.yaml.YAML(typ="safe")
YAML.register_class(Foo)
# YAML.register_class(defaultdict)
fp =  Path("/temp/test.yaml")
YAML.dump(Foo(), fp)
YAML.load(fp)

编辑:更优雅的解决方案是简单地添加SafeRepresenter.represent_dict 方法作为defaultdict 的代表:

from ruamel.yaml import representer
representer.SafeRepresenter.add_representer(defaultdict, representer.SafeRepresenter.represent_dict)

【讨论】:

  • 感谢您的清晰解释。我们可以通过为defaultdict 添加一个代表来实现它吗?我相信这将使解决方案稍微不那么“hacky”。如果它应该被视为一个常规的 dict,我们应该能够重用其中的一些,对吗?
  • 确实如此。我已经用一个更优雅的解决方案更新了我的答案。请忽略猴子补丁。 :-)
  • 以这种方式使用简单的add_representer 的缺点是您的转储不会包含有关在加载时重新创建默认字典的信息。你会得到一个普通的字典,因为文件没有!defaultdict 标签。
【解决方案2】:

现在有一个包ruamel.yaml.pytypes 支持转储defaultdict 实例。请注意,如果您提供函数作为参数(对于default_factory),则需要指定typ='unsafe',否则无法表示您的工厂函数。

在你的 virtualenv 中安装 ruamel.yaml.pytypesruamel.yaml 之后,你可以这样做:

yaml = ruamel.yaml.YAML(typ=['unsafe', 'pytypes'])
yaml.default_flow_style = False
buf = ruamel.yaml.compat.StringIO()

def factory():
    import datetime
    return datetime.datetime.now()

data = defaultdict(factory)

x = data[4]
data[2] = 42
yaml.dump(data, buf)
print(buf.getvalue(), end='')
d = yaml.load(buf.getvalue())
assert data == d
assert data.default_factory == d.default_factory

上面会打印出来(你的日期时间会不同)。

!defaultdict
- !!python/name:__main__.factory 
- 2: 42
  4: 2019-08-19 13:06:05.129019

(并且断言不会抛出异常)


查看编辑历史以了解获得类似结果的“手动”方式。

【讨论】:

  • 在另一个答案中看到,ruamel 内置了一个代表:from ruamel.yaml import representer representer.SafeRepresenter.add_representer(defaultdict, representer.SafeRepresenter.represent_dict)
  • ruamel 是一个包命名空间,没有代表。 ruamel.yaml.pytypes 有一个内置的 defaultdict 表示器。您在此处重复但没有指出缺点的是转储 defaultdict,就好像它是一个普通的 dict,在此过程中丢失信息。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-01-10
  • 1970-01-01
  • 2021-06-26
  • 2021-12-06
  • 2011-11-07
相关资源
最近更新 更多