1.PyTorch: Tensors 与 autograd
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
This implementation computes the forward pass using operations on PyTorch
Tensors, and uses PyTorch autograd to compute gradients.
A PyTorch Tensor represents a node in a computational graph. If x is a
Tensor that has x.requires_grad=True then x.grad is another Tensor
holding the gradient of x with respect to some scalar value.
import torch
import matplotlib.pyplot as plt
import torch.optim as optim
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
# 将requires_grad=False设置为False表明在反向传播过程中不需要计算这些Tensors的梯度
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
# 将requires_grad=True设置为True表明在反向传播过程中计算这些Tensors的梯度
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
optimizer = optim.SGD([{'params': w1},
{'params': w2}],
lr = learning_rate)
#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')
iter_plot = []
loss_plot = []
for t in range(500):
# Forward pass: compute predicted y using operations on Tensors; these
# are exactly the same operations we used to compute the forward pass using
# Tensors, but we do not need to keep references to intermediate values since
# we are not implementing the backward pass by hand.
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Compute and print loss using operations on Tensors.
# Now loss is a Tensor of shape (1,)
# loss.item() gets the a scalar value held in the loss.
loss = (y_pred - y).pow(2).sum()
iter_plot.append(t)
loss_plot.append(loss.item())
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Tensors with requires_grad=True.
# After this call w1.grad and w2.grad will be Tensors holding the gradient
# of the loss with respect to w1 and w2 respectively.
loss.backward()
# Manually update weights using gradient descent. Wrap in torch.no_grad()
# because weights have requires_grad=True, but we don't need to track this
# in autograd.
# 在使用autograd过程中不需要对各变量的梯度进行追踪,因而使用 with torch.no_grad()
# An alternative way is to operate on weight.data and weight.grad.data.
# Recall that tensor.data gives a tensor that shares the storage with
# tensor, but doesn't track history.
# You can also use torch.optim.SGD to achieve this.
# with torch.no_grad():
# w1 -= learning_rate * w1.grad
# w2 -= learning_rate * w2.grad
# # Manually zero the gradients after updating weights
# w1.grad.zero_()
# w2.grad.zero_()
optimizer.step() # Does the update
optimizer.zero_grad()# 每进行一次更新都要对梯度清零
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()
2.PyTorch: 定义新的autograd函数
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
This implementation computes the forward pass using operations on PyTorch
Variables, and uses PyTorch autograd to compute gradients.
在本文中,我们使用自定义autograd函数实现ReLU函数。
实现自定义autograd函数的方法:
自定义的autograd函数有两个方法:forward函数使用输入Tensors计算输出Tensors;backward函数接收标量值关于输出Tensors的梯度,并计算同样的标量关于输入Tensors的梯度。
import torch
import matplotlib.pyplot as plt
class MyReLU(torch.autograd.Function):
"""
为了实现自定义autograd函数,我们需要继承torch.autograd.Function类并重载forward、backward方法,这两个方法对Tensors进行操作。
"""
@staticmethod
def forward(ctx, input):
"""
在forward方法中,我们接收input Tensor,并返回output Tensor。上下文对象
ctx对用于backward计算的相关信息进行存储。我们可以使用ctx.save_for_backward
方法对任意对象(用于backward操作)进行存储。
"""
ctx.save_for_backward(input)# 将input存入ctx中
return input.clamp(min=0)# 返回输出张量
@staticmethod
def backward(ctx, grad_output):
"""
在backward函数中,我们接收到一个包含loss关于输入tensor的张量,我们需要在此梯度的基础上
计算loss关于input张量的梯度(链式求导法则)。
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')
iter_plot = []
loss_plot = []
learning_rate = 1e-6
for t in range(500):
# 为了使用自定义函数,需要调用Function.apply方法声明实例。
relu = MyReLU.apply
# Forward pass: compute predicted y using operations; we compute
# ReLU using our custom autograd operation.
y_pred = relu(x.mm(w1)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
iter_plot.append(t)
loss_plot.append(loss.item())
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Manually zero the gradients after updating weights
w1.grad.zero_()
w2.grad.zero_()
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()