【发布时间】:2021-12-14 16:32:48
【问题描述】:
我尝试按照本教程 https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial 自己构建一个 RNN。我使用以下网络架构构建了自己的版本,这与教程不同。a stands for input layer, h hidden, o output。这是我的代码:
class RNN(nn.Module):
def __init__(self,input_size,hidden_size,output_size,initial_hidden):
super(RNN, self).__init__()
self.linear1 = nn.Linear(input_size,hidden_size)
self.linear2 = nn.Linear(hidden_size,hidden_size,bias=False)
self.linear3 = nn.Linear(hidden_size,output_size)
self.prev_hidden = initial_hidden
def forward(self,X):
input = torch.add(self.linear1(X).view(1,-1),self.linear2(self.prev_hidden.to(device))
hidden = nn.ReLU()(input)
self.prev_hidden = hidden.detach()
output = self.linear3(hidden)
return output
这个模型在所有样本的 loss = 12000 处停止,并且不再真正下降。然而,在切换到教程中描述的模型后,隐藏层和输入层共享相同的权重,在相同的超参数下,损失下降到 4000。代码如下:
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
为什么教程中的模型架构比我的版本好很多?
【问题讨论】:
标签: machine-learning deep-learning neural-network nlp