“TensorDataset”对象没有“大小”属性答案

【问题标题】：'TensorDataset' object has no attribute 'size'“TensorDataset”对象没有“大小”属性
【发布时间】：2021-06-27 08:14:21
【问题描述】：

我尝试将 csv 文件加载到张量数据集中以进行垂直联邦学习。参考网址是https://github.com/OpenMined/PyVertical/blob/master/examples/PyVertical%20Example.ipynb

以下是我加载文件但失败的方式

train = pd.read_csv('datatrain.csv')   # load data

cols = ["a","b,"c"]   # select feature columns

train_feature = train[cols]   # create dataset with features
train_target = train['result']   # the dataset with result

# turn them in to torch.tensor data
train_feature_tensor = torch.tensor(train_feature.values)
train_target_tensor = torch.tensor(train_target.values)

# Put them into a TensorDataset
train_tensor = data_utils.dataset.TensorDataset(train_feature_tensor, train_target_tensor)

# them put them in to add_ids()
temp = add_ids(data_utils.dataset.TensorDataset)
temp.data = train_tensor
traindata_ft = temp(train_tensor)

输出：

'TensorDataset' object has no attribute 'size'

他们指出问题出在：

assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)

在：

class TensorDataset(Dataset):
    r"""Dataset wrapping tensors.

    Each sample will be retrieved by indexing tensors along the first dimension.

    Arguments:
        *tensors (Tensor): tensors that have the same size of the first dimension.
    """

    def __init__(self, *tensors):
        assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
        self.tensors = tensors

    def __getitem__(self, index):
        return tuple(tensor[index] for tensor in self.tensors)

    def __len__(self):
        return self.tensors[0].size(0)

关于add_ids()，它是一个为每个数据行生成唯一id的函数。原代码如下

def add_ids(cls):
    """Decorator to add unique IDs to a dataset
    Args:
        cls (torch.utils.data.Dataset) : dataset to generate IDs for
    Returns:
        VerticalDataset : A class which wraps cls to add unique IDs as an attribute,
            and returns data, target, id when __getitem__ is called
    """

    class VerticalDataset(cls):
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)

            self.ids = np.array([uuid4() for _ in range(len(self))])

        def __getitem__(self, index):
            if self.data is None:
                img = None
            else:
                img = self.data[index]
                img = Image.fromarray(img.numpy(), mode="L")

                if self.transform is not None:
                    img = self.transform(img)

            if self.targets is None:
                target = None
            else:
                target = int(self.targets[index]) if self.targets is not None else None

                if self.target_transform is not None:
                    target = self.target_transform(target)

            id = self.ids[index]

            # Return a tuple of non-None elements
            return (*filter(lambda x: x is not None, (img, target, id)),)

        def __len__(self):
            if self.data is not None:
                return self.data.size(0)
            else:
                return len(self.targets)

        def get_ids(self) -> List[str]:
            """Return a list of the ids of this dataset."""
            return [str(id_) for id_ in self.ids]

        def sort_by_ids(self):
            """
            Sort the dataset by IDs in ascending order
            """
            ids = self.get_ids()
            sorted_idxs = np.argsort(ids)

            if self.data is not None:
                self.data = self.data[sorted_idxs]

            if self.targets is not None:
                self.targets = self.targets[sorted_idxs]

            self.ids = self.ids[sorted_idxs]

    return VerticalDataset

【问题讨论】：

您好，add_ids 应该做什么并返回？这未在您的代码 sn-p 中声明。请尝试发布stackoverflow.com/help/minimal-reproducible-example :) 无论如何，pytorch 的任何数据集都不会有size 方法，只有__len__。但是，可以为张量调用 size，这似乎是断言所期望的。您需要找出为什么要将数据集放在预期张量的位置
@trialNerror 谢谢提醒。我刚刚在帖子中添加了add_ids 解释。 add_ids 通常为每个数据行创建唯一的 id。非常感谢您的评论；）

标签： pytorch dataset tensor

【解决方案1】：

好的，现在更清楚了！所以add_idx 创建了一个新类，它继承自您作为参数提供的类。因此当你打电话时

temp = add_ids(data_utils.dataset.TensorDataset)

temp 实际上是TensorDataset 的子类。它的__init__ 的实现方式意味着您可以通过给它提供实例化TensorDataset 的参数来实例化它，而不是通过给它一个TensorDataset 的实例来实例化它，这正是您在这里所做的：

# Put them into a TensorDataset
train_tensor = data_utils.dataset.TensorDataset(train_feature_tensor, train_target_tensor)
# train_tensor is a dataset, NOT a tensor !
traindata_ft = temp(train_tensor)

所以你应该做的是：

# More explicit and useful name, instead of `temp`
VerticalTensorDataset = add_ids(data_utils.dataset.TensorDataset)
# Instantiate it with tensors
traindata_ft  = VerticalTensorDataset(train_feature_tensor, train_target_tensor)

【讨论】：