【问题标题】：How to implement Flatten layer with batch size > 1 in Pytorch (Pytorch_Geometric)如何在 Pytorch (Pytorch_Geometric) 中实现批量大小 > 1 的 Flatten 层
【发布时间】：2021-07-31 19:01:58
【问题描述】：

我是 Pytorch 的新手，由于内存问题，我正在尝试将我以前的代码从 Tensorflow 转移到 Pytorch。但是，在尝试重现Flatten 层时，不断出现一些问题。

在我的DataLoader 对象中，batch_size 与输入的第一维混合（在我的 GNN 中，从DataLoader 对象解压缩的输入大小为 [batch_size*node_num, attribute_num]，例如 [4*896 , 32] 在 GCNConv 层之后）。基本上，如果我在GCNConv 之后实现torch.flatten()，样本会混合在一起（到[4*896*32]），并且这个网络只有1 个输出，而我期望#batch_size 输出。如果我改用 nn.Flatten()，似乎什么也没有发生（仍然是 [4*896, 32]）。我应该在一开始就将batch_size设置为输入的第一个dim，还是应该直接使用view()函数？我尝试直接使用view() 并且它（似乎已经）工作了，尽管我不确定这是否与 Flatten 相同。请参考我下面的代码。我目前正在使用 global_max_pool 因为它可以工作（它可以直接分隔batch_size）。

顺便说一句，我不知道为什么 Pytorch 的训练这么慢...当node_num 提高到 13000 时，我需要一个小时来通过一个 epoch，每个测试折叠我有 100 个 epoch 和 10测试折叠。在 tensorflow 中，整个训练过程只需要几个小时。相同的网络架构和原始输入数据，如我另一篇文章中的here 所示，其中也描述了我在使用 TF 时遇到的内存问题。

有一段时间很沮丧。我检查了this 和this 的帖子，但似乎他们的问题与我的有些不同。非常感谢任何帮助！

代码：

# Generate dataset
class STDataset(InMemoryDataset):
    def __init__(self, root, transform=None, pre_transform=None):
        super(STDataset, self).__init__(root, transform, pre_transform)
        self.data, self.slices = torch.load(self.processed_paths[0])
    
    @property
    def raw_file_names(self):
        return []

    @property
    def processed_file_names(self):
        return ['pygdata.pt']

    def download(self):
        pass

    def process(self):
        data_list= []
        for i in range(sample_size):
            data = Data(x=torch.tensor(X_all[i],dtype=torch.float),edge_index=edge_index,y=torch.FloatTensor(y_all[i]))
            data_list.append(data)

        data, slices = self.collate(data_list)
        torch.save((data, slices), self.processed_paths[0])
        
dataset = STDataset(root=save_dir)
train_dataset = dataset[:len(X_train)]
val_dataset = dataset[len(X_train):(len(X_train)+len(X_val))]
test_dataset = dataset[(len(X_train)+len(X_val)):]


# Build network

from torch_geometric.nn import GCNConv, GATConv, TopKPooling, global_max_pool, global_mean_pool
from torch.nn import Flatten, Linear, ELU
import torch.nn.functional as F

class GCN(torch.nn.Module):
    def __init__(self):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(in_channels = feature_num, out_channels = 32)
        self.conv2 = GCNConv(in_channels = 32, out_channels = 32)
        self.fc1 = Flatten()
#         self.ln1 = Linear(in_features = batch_size*N*32, out_features = 512) 
        self.ln1 = Linear(in_features = 32, out_features = 32)
        self.ln2 = Linear(in_features = 32, out_features = 1) 

    
    def forward(self,x,edge_index,batch):   
#         x, edge_index, batch = data.x, data.edge_index, data.batch
#         print(np.shape(x),np.shape(edge_index),np.shape(batch))
        x = F.elu(self.conv1(x,edge_index))
#         x = x.squeeze(1) 
        x = F.elu(self.conv2(x,edge_index))
        print(np.shape(x))
        x = self.fc1(x)
#         x = torch.flatten(x,0)
#         x = torch.cat([global_max_pool(x,batch),global_mean_pool(x,batch)],dim=1)
        print(np.shape(x))
        x = self.ln1(x)
        x = F.relu(x)
        ## Dropout?
        print("o")
        x = torch.sigmoid(self.ln2(x))
        return x
        
# training
def train():
    model.train()
    loss_all=0
    correct = 0
    for i, data in enumerate(train_loader, 0):
        data = data.to(device)
        optimizer.zero_grad() 
        output = model(data.x, data.edge_index,data.batch)
        label = data.y.to(device)
        loss = loss_func(output, label)
        loss.backward()
        loss_all += loss.item()
        
        output = output.detach().cpu().numpy().squeeze()
        label = label.detach().cpu().numpy().squeeze()        
        correct += (abs(output-label)<0.5).sum()
        
        optimizer.step()
  
    return loss_all / len(train_dataset), correct / len(train_dataset)

device = torch.device('cuda')
model = GCN().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
loss_func = torch.nn.BCELoss()  # binary cross-entropy
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle = True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle = True)
for epoch in range(num_epochs):
    gc.collect()
    train_loss, train_acc = train()

使用torch.nn.Flatten(start_dim = 1)的错误信息（上面的代码）：

ValueError                                Traceback (most recent call last)
<ipython-input-42-c96e8b058742> in <module>
     65 for epoch in range(num_epochs):
     66     gc.collect()
---> 67     train_loss, train_acc = train()

<ipython-input-42-c96e8b058742> in train()
     10         output = model(data.x, data.edge_index,data.batch)
     11         label = data.y.to(device)
---> 12         loss = loss_func(output, label)
     13         loss.backward()
     14         loss_all += loss.item()

~/miniconda3/envs/ST-Torch/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/miniconda3/envs/ST-Torch/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
    496 
    497     def forward(self, input, target):
--> 498         return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
    499 
    500 

~/miniconda3/envs/ST-Torch/lib/python3.7/site-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
   2068     if input.numel() != target.numel():
   2069         raise ValueError("Target and input must have the same number of elements. target nelement ({}) "
-> 2070                          "!= input nelement ({})".format(target.numel(), input.numel()))
   2071 
   2072     if weight is not None:

ValueError: Target and input must have the same number of elements. target nelement (4) != input nelement (3584)

【问题讨论】：

标签： python neural-network pytorch conv-neural-network flatten

【解决方案1】：

您希望形状为batch_size*node_num, attribute_num 的方式有点奇怪。

通常它应该是batch_size, node_num*attribute_num，因为您需要将输入与输出相匹配。 Pytorch 中的 Flatten 正是这样做的。

如果你真正想要的是batch_size*node_num, attribute_num，那么你只需要使用view 或reshape 重新塑造张量即可。实际上，Flatten 本身只是调用.reshape。

tensor.view：这会将现有的张量重塑为新的形状，如果你编辑这个新的张量，旧的张量也会改变。

tensor.reshape：这将使用来自旧张量但具有新形状的数据创建一个新张量。

    def forward(self,x,edge_index,batch):   
        x = F.elu(self.conv1(x,edge_index))
        x = F.elu(self.conv2(x,edge_index))

        # print(np.shape(x)) # don't use this
        print(x.size())  # use this

        # x = self.fc1(x)  # this is the old one
        ## choose one of these
        x = x.view(4*896, 32)
        x = x.reshape(4*896, 32)  

        # print(np.shape(x)) # don't use this
        print(x.size())  # use this

        x = self.ln1(x)
        x = F.relu(x)
        ## Dropout?
        print("o")
        x = torch.sigmoid(self.ln2(x))
        return x

编辑 2 重塑

假设我们有一个[[[1, 1, 1], [2, 2, 2]]] 的数组，其形状为(1, 2, 3)，在Tensorflow 中代表(batch, length, channel)。

如果您想在 Pytorch 中正确使用这些数据，您需要将其设为 (batch, channel, length)，即 (1, 3, 2)。

这是permute 和reshape 之间的区别

>>> x = torch.tensor([[[1, 1, 1], [2, 2, 2]]])
>>> x.size()
torch.Size([1, 2, 3])
>>> x[0, 0, :]
tensor([1, 1, 1])
>>> y = x.reshape((1, 3, 2))
>>> y
tensor([[[1, 1],
         [1, 2],
         [2, 2]]])
>>> y[0, :, 0]
tensor([1, 1, 2])
>>> z = x.permute(0, 2, 1)
>>> z
tensor([[[1, 2],
         [1, 2],
         [1, 2]]])
>>> z[0, :, 0]
tensor([1, 1, 1])

如您所见，x 和 z 的第一个通道是 [1, 1, 1]，这是我们想要的，而 y 是 [1, 1, 2]。

【讨论】：

感谢您的解释！有很大帮助。我实际上想要[batch_size, node_num, attribute_num]，因为它会在Tensorflow 中编码，但DataLoader() 函数（用作教程教授）只给了我[batch_size*node_num, attribute_num]。我认为这是正常的 Pytorch 逻辑，但看起来不是？老实说，我确实希望它可以是别的东西；）也许我应该在网络一开始就手动重塑输入？
@jasperhyp Pytorch 实际上与 Tensorflow 类似。您只需在批处理后将通道轴从最后一个通道交换到右侧。
如果教程中的代码可以工作，那就没问题。但除此之外，您也可以使用 Tensorflow 的数据加载器并更改通道然后转换为 Torch 张量。
感谢您的评论！我不确定我是否理解“在批处理后将通道轴从最后一个通道交换到右侧”。你的意思是直接重塑输入数据吗？我现在认为没有batch_size 作为独立维度确实有问题，因为与批处理相关的任何层（例如批处理规范化层）都没有按预期工作。
不是重塑，而是permute。对于 numpy 数组，它称为 rollaxis 或 swapaxis。