【发布时间】:2021-06-14 11:42:28
【问题描述】:
我正在尝试为 MNIST 构建一个简单的自动编码器,其中中间层只有 10 个神经元。我希望它能够学会对这 10 位数字进行分类,并且我认为这最终会导致最低的错误(wrt 再现原始图像)。
我有以下代码,我已经玩过很多次了。如果我运行它最多 100 个 epoch,损失并不会真正低于 1.0,如果我评估它,它显然不起作用。我错过了什么?
培训:
import torch
import torchvision as tv
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torchvision.utils import save_image
num_epochs = 100
batch_size = 64
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
trainset = tv.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=4)
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder,self).__init__()
self.encoder = nn.Sequential(
# 28 x 28
nn.Conv2d(1, 4, kernel_size=5),
nn.Dropout2d(p=0.2),
# 4 x 24 x 24
nn.ReLU(True),
nn.Conv2d(4, 8, kernel_size=5),
nn.Dropout2d(p=0.2),
# 8 x 20 x 20 = 3200
nn.ReLU(True),
nn.Flatten(),
nn.Linear(3200, 10),
nn.ReLU(True),
# 10
nn.Softmax(),
# 10
)
self.decoder = nn.Sequential(
# 10
nn.Linear(10, 400),
nn.ReLU(True),
# 400
nn.Unflatten(1, (1, 20, 20)),
# 20 x 20
nn.Dropout2d(p=0.2),
nn.ConvTranspose2d(1, 10, kernel_size=5),
# 24 x 24
nn.ReLU(True),
nn.Dropout2d(p=0.2),
nn.ConvTranspose2d(10, 1, kernel_size=5),
# 28 x 28
nn.ReLU(True),
nn.Sigmoid(),
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
model = Autoencoder().cpu()
distance = nn.MSELoss()
#optimizer = torch.optim.Adam(model.parameters(), weight_decay=1e-5)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
for epoch in range(num_epochs):
for data in dataloader:
img, _ = data
img = Variable(img).cpu()
output = model(img)
loss = distance(output, img)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('epoch [{}/{}], loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
训练损失已经表明事情不工作,但打印出混淆矩阵(在这种情况下不一定是单位矩阵,因为神经元可以任意排序,但应该是行列可重新排序的并近似身份,如果这可行的话):
import numpy as np
confusion_matrix = np.zeros((10, 10))
batch_size = 20*1000
testset = tv.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True, num_workers=4)
for data in dataloader:
imgs, labels = data
imgs = Variable(imgs).cpu()
encs = model.encoder(imgs).detach().numpy()
for i in range(len(encs)):
predicted = np.argmax(encs[i])
actual = labels[i]
confusion_matrix[actual][predicted] += 1
print(confusion_matrix)
【问题讨论】:
标签: python pytorch autoencoder mnist