【问题标题】:Property Setter for Subclass of Pandas DataFramePandas DataFrame 子类的属性设置器
【发布时间】:2020-04-26 13:27:27
【问题描述】:

我正在尝试设置pd.DataFrame 的子类,它在初始化时有两个必需的参数(grouptimestamp_col)。我想对这些参数grouptimestamp_col 运行验证,所以我对每个属性都有一个setter 方法。这一切都有效,直到我尝试set_index() 并获得TypeError: 'NoneType' object is not iterable。在test_set_indextest_assignment_with_indexed_obj 中似乎没有参数被传递给我的setter 函数。如果我将 if g == None: return 添加到我的 setter 函数中,我可以通过测试用例,但认为这不是正确的解决方案。

我应该如何为这些必需的参数实现属性验证?

下面是我的课:

import pandas as pd
import numpy as np


class HistDollarGains(pd.DataFrame):
    @property
    def _constructor(self):
        return HistDollarGains._internal_ctor

    _metadata = ["group", "timestamp_col", "_group", "_timestamp_col"]

    @classmethod
    def _internal_ctor(cls, *args, **kwargs):
        kwargs["group"] = None
        kwargs["timestamp_col"] = None
        return cls(*args, **kwargs)

    def __init__(
        self,
        data,
        group,
        timestamp_col,
        index=None,
        columns=None,
        dtype=None,
        copy=True,
    ):
        super(HistDollarGains, self).__init__(
            data=data, index=index, columns=columns, dtype=dtype, copy=copy
        )

        self.group = group
        self.timestamp_col = timestamp_col

    @property
    def group(self):
        return self._group

    @group.setter
    def group(self, g):
        if g == None:
            return

        if isinstance(g, str):
            group_list = [g]
        else:
            group_list = g

        if not set(group_list).issubset(self.columns):
            raise ValueError("Data does not contain " + '[' + ', '.join(group_list) + ']')
        self._group = group_list

    @property
    def timestamp_col(self):
        return self._timestamp_col

    @timestamp_col.setter
    def timestamp_col(self, t):
        if t == None:
            return
        if not t in self.columns:
            raise ValueError("Data does not contain " + '[' + t + ']')
        self._timestamp_col = t

这是我的测试用例:

import pytest

import pandas as pd
import numpy as np

from myclass import *


@pytest.fixture(scope="module")
def sample():
    samp = pd.DataFrame(
        [
            {"timestamp": "2020-01-01", "group": "a", "dollar_gains": 100},
            {"timestamp": "2020-01-01", "group": "b", "dollar_gains": 100},
            {"timestamp": "2020-01-01", "group": "c", "dollar_gains": 110},
            {"timestamp": "2020-01-01", "group": "a", "dollar_gains": 110},
            {"timestamp": "2020-01-01", "group": "b", "dollar_gains": 90},
            {"timestamp": "2020-01-01", "group": "d", "dollar_gains": 100},
        ]
    )

    return samp

@pytest.fixture(scope="module")
def sample_obj(sample):
    return HistDollarGains(sample, "group", "timestamp")

def test_constructor_without_args(sample):
    with pytest.raises(TypeError):
        HistDollarGains(sample)


def test_constructor_with_string_group(sample):
    hist_dg = HistDollarGains(sample, "group", "timestamp")
    assert hist_dg.group == ["group"]
    assert hist_dg.timestamp_col == "timestamp"


def test_constructor_with_list_group(sample):
    hist_dg = HistDollarGains(sample, ["group", "timestamp"], "timestamp")

def test_constructor_with_invalid_group(sample):
    with pytest.raises(ValueError):
        HistDollarGains(sample, "invalid_group", np.random.choice(sample.columns))

def test_constructor_with_invalid_timestamp(sample):
    with pytest.raises(ValueError):
        HistDollarGains(sample, np.random.choice(sample.columns), "invalid_timestamp")

def test_assignment_with_indexed_obj(sample_obj):
    b = sample_obj.set_index(sample_obj.group + [sample_obj.timestamp_col])

def test_set_index(sample_obj):
    # print(isinstance(a, pd.DataFrame))
    assert sample_obj.set_index(sample_obj.group + [sample_obj.timestamp_col]).index.names == ['group', 'timestamp']

【问题讨论】:

  • 如果Nonegroup 属性的无效值,您不应该提出ValueError 吗?
  • 你说得对,None 是一个无效值,这就是我不喜欢 if 语句的原因。但是添加 None 使它通过测试。我正在寻找如何在没有 None if 语句的情况下正确解决此问题。
  • 二传手应该提出一个ValueError。问题是首先要弄清楚是什么试图将group 属性设置为None
  • @chepner 是的,完全正确。
  • 也许 Pandas Flavor 包可以提供帮助。

标签: python pandas properties subclass


【解决方案1】:

set_index() 方法将在内部调用 self.copy() 来创建 DataFrame 对象的副本(请参阅源代码 here),其中它使用您自定义的构造方法 _internal_ctor() 来创建新对象(source)。请注意,self._constructor()self._internal_ctor() 相同,这是几乎所有 pandas 类在深度复制或切片等操作期间创建新实例的常用内部方法。您的问题实际上源于此功能:

class HistDollarGains(pd.DataFrame):
    ...
    @classmethod
    def _internal_ctor(cls, *args, **kwargs):
        kwargs["group"]         = None
        kwargs["timestamp_col"] = None
        return cls(*args, **kwargs) # this is equivalent to calling
                                    # HistDollarGains(data, group=None, timestamp_col=None)

我猜你是从the github issue 复制了这段代码。 kwargs["**"] = None 行明确告诉构造函数将None 设置为grouptimestamp_col。最后,setter/validator 获取 None 作为新值并引发错误。

因此,您应该为grouptimestamp_col 设置一个可接受的值。

    @classmethod
    def _internal_ctor(cls, *args, **kwargs):
        kwargs["group"]         = []
        kwargs["timestamp_col"] = 'timestamp' # or whatever name that makes your validator happy
        return cls(*args, **kwargs)

然后您可以删除验证器中的if g == None: return 行。

【讨论】:

    猜你喜欢
    • 2016-05-04
    • 2011-11-30
    • 1970-01-01
    • 2017-08-03
    • 1970-01-01
    • 1970-01-01
    • 2012-03-06
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多