1
2
3
# For tips on running notebooks in Google Colab, see
# https://pytorch.org/tutorials/beginner/colab
%matplotlib inline

Training a Classifier

This is it. You have seen how to define neural networks, compute loss and make updates to the weights of the network.

已经了解了如何定义神经网络、计算损失以及更新网络权重。

Now you might be thinking, What about data?如何处理不同的数据类型

Generally, when you have to deal with image, text, audio or video data, you can use standard python packages that load data into a numpy array. Then you can convert this array into a torch.*Tensor.

  • For images, packages such as Pillow, OpenCV are useful;图片可以使用的包
  • For audio, packages such as scipy and librosa;音频可以使用的包
  • For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful;文本可以使用的包

Specifically for vision, we have created a package called torchvision, that has data loaders for common datasets such as ImageNet, CIFAR10, MNIST, etc. and data transformers for images, viz., torchvision.datasets and torch.utils.data.DataLoader.

针对图像处理,我们创建了一个名为“torchvision”的包,其中包含用于常见数据集(例如 ImageNet、CIFAR10、MNIST 等)的数据加载器以及用于图像的数据转换器,即“torchvision.datasets”和“torch.utils” .data.DataLoader`。

This provides a huge convenience and avoids writing boilerplate code.

For this tutorial, we will use the CIFAR10 dataset. It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

在本教程中,我们将使用 CIFAR10 数据集。它的分类有:“飞机”、“汽车”、“鸟”、“猫”、“鹿”、“狗”、“青蛙”、“马”、“船”、“卡车”。 CIFAR-10中的图像尺寸为3x32x32,即尺寸为32x32像素的3通道彩色图像。

cifar10

Training an image classifier

We will do the following steps in order:

通过下面几步训练一个图片分类器:

  1. Load and normalize the CIFAR10 training and test datasets using torchvision;加载并规范化数据集
  2. Define a Convolutional Neural Network;定义卷积神经网络
  3. Define a loss function;定义损失函数
  4. Train the network on the training data;训练
  5. Test the network on the test data;测试

1. Load and normalize CIFAR10

Using torchvision, it’s extremely easy to load CIFAR10.使用torchvision模块加载CIFAR10数据集

1
2
3
import torch
import torchvision
import torchvision.transforms as transforms

The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].

torchvision数据集的输出是范围 [0, 1] 的 PILImage 图像。我们将它们归一化为范围 [-1, 1] 的张量。

NOTE:

If running on Windows and you get a BrokenPipeError, try settingthe num_worker of torch.utils.data.DataLoader() to 0.如果在 Windows 上运行并且出现 BrokenPipeError,请尝试将 torch.utils.data.DataLoader() 的 num_worker 设置为 0。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
transform = transforms.Compose(
[transforms.ToTensor(),#
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)# 训练数据集
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=2)# 训练数据加载器

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)# 测试数据集
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False, num_workers=2)# 测试数据加载器

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')# 将标签元素(字符串)与编号对应起来
Files already downloaded and verified
Files already downloaded and verified

transforms.ToTensor():
这个步骤将图像转换为 PyTorch 张量。图像的像素值通常是范围在 [0, 255] 的整数,通过这个转换,像素值会被标准化为 [0.0, 1.0] 之间的浮点数。
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)):
这个步骤对张量进行归一化。它使用给定的均值和标准差来调整像素值。这里的 (0.5, 0.5, 0.5) 代表图像的三个通道(通常是红、绿、蓝)的均值和标准差,分
别是 0.5。归一化的操作是将像素值从 [0.0, 1.0] 范围调整到 [-1.0, 1.0] 范围。
归一化的公式如下:

normalized _ value=input_valuemeanstd\text{normalized \_ value} = \frac{\text{input\_value}-\text{mean}}{\text{std}}

设置mean=0.5std=0.5,对于一个像素值为 1.0的店:

normalized _ value=1.00.50.5=1.0\text{normalized \_ value} = \frac{1.0 - 0.5}{0.5} = 1.0

对于一个像素值为0.0 的点:

normalized _ value=0.00.50.5=1.0\text{normalized \_ value} = \frac{0.0 - 0.5}{0.5} = -1.0

Let us show some of the training images, for fun.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
img = img / 2 + 0.5 # unnormalize标准化通常是将图像像素值从 [0, 1] 映射到 [-1, 1],而去标准化过程则是将其映射回 [0, 1]。
npimg = img.numpy()# 将 PyTorch 的张量(Tensor)转换为 NumPy 数组
plt.imshow(np.transpose(npimg, (1, 2, 0)))# 将其维度从 (C, H, W)(通道、高度、宽度)转换为 (H, W, C)
plt.show()


# get some random training images
dataiter = iter(trainloader)# 创建一个数据迭代器 dataiter,用于从 trainloader 中获取数据。
images, labels = next(dataiter)# 使用 next(dataiter) 获取一批图像 images 和它们对应的标签 labels。(一批为4张)
print(images.shape)# 该批次一共四张照片,每张照片3个通道,每个通道的宽高为32x32
print(labels.shape)# 该批次所有照片的标签组成的张量
print(labels)

# show images
imshow(torchvision.utils.make_grid(images))# 使用 torchvision.utils.make_grid 将多张图像合并成一张网格图片
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))# classes 是一个包含所有类别名称的列表。labels 是一个包含每张图像标签索引的张量。batch_size 是一次批量处理的图像数量。通过列表推导式和字符串格式化,将每个标签名称格式化为一个5个字符宽的字符串,并用空格连接成一个字符串进行打印。
torch.Size([4, 3, 32, 32])
torch.Size([4])
tensor([6, 1, 2, 6])

output_7_1

frog  car   bird  frog 
  1. Define a Convolutional Neural Network

Copy the neural network from the Neural Networks section before and modify it to take 3-channel images (instead of 1-channel images as it was defined).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)# 3通道--->6通道,5x5 方形卷积
self.pool = nn.MaxPool2d(2, 2)# 下采样,kernel=2 x 2
self.conv2 = nn.Conv2d(6, 16, 5)# 6通道--->16通道,5x5 方形卷积
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))# 卷积1->激活->池化
x = self.pool(F.relu(self.conv2(x)))# 卷积2->激活->池化
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))# 线性层1->激活
x = F.relu(self.fc2(x))# 线性层2->激活
x = self.fc3(x)# 线性层3
return x


net = Net()
  1. Define a Loss function and optimizer

Let’s use a Classification Cross-Entropy loss and SGD with momentum.

1
2
3
4
import torch.optim as optim

criterion = nn.CrossEntropyLoss()# 交叉熵损失
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)# 随机梯度下降
  1. Train the network

This is when things start to get interesting. We simply have to loop over our data iterator, and feed the inputs to the network and optimize.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
for epoch in range(2):  # loop over the dataset multiple times;epoch:0,1

running_loss = 0.0# 一次迭代(所有数据样本都参与了训练)结束后,损失重置
for i, data in enumerate(trainloader, 0):# data每一次取出一个批次的数据
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data# inputs, labels的形式见上一个cell

# zero the parameter gradients
optimizer.zero_grad()# 在forward之前,梯度清零

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches;每2000个小批次训练结束后,打印一次结果
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0

print('Finished Training')

`f` 表示这是一个 f-string,即格式化字符串。它允许在字符串中直接嵌入表达式并进行格式化。
`[{epoch + 1}, {i + 1:5d}]` 是格式化字符串的一部分。这里使用了两个变量:`epoch` 和 `i`。
`{i + 1:5d}`:表示当前的批次(batch)。i 是从 0 开始的,i + 1 使其从 1 开始显示。`:5d` 表示将数字格式化为至少 5 位宽的整数,不足的部分用空格填充。
`:.3f` 表示将损失值格式化为浮点数,并保留 3 位小数。

[1,  2000] loss: 2.216
[1,  4000] loss: 1.885
[1,  6000] loss: 1.701
[1,  8000] loss: 1.609
[1, 10000] loss: 1.556
[1, 12000] loss: 1.486
[2,  2000] loss: 1.422
[2,  4000] loss: 1.397
[2,  6000] loss: 1.353
[2,  8000] loss: 1.344
[2, 10000] loss: 1.307
[2, 12000] loss: 1.306
Finished Training

Let’s quickly save our trained model:

1
2
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)# 将当前模型参数存储

See here for more details on saving PyTorch models.

  1. Test the network on the test data

We have trained the network for 2 passes over the training dataset. But we need to check if the network has learnt anything at all.

我们已经在训练数据集上对网络进行了 2 次训练(epoch)。但我们需要检查网络是否学到了任何东西。

We will check this by predicting the class label that the neural network outputs, and checking it against the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.

我们将通过预测神经网络输出的类标签并根据真实情况进行检查来检查这一点。如果预测正确,我们会将样本添加到正确预测列表中。

Okay, first step. Let us display an image from the test set to get familiar.

显示测试集中的图像以熟悉一下。

1
2
3
4
5
6
dataiter = iter(testloader)
images, labels = next(dataiter)

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))


output_17_0

GroundTruth:  cat   ship  ship  plane

Next, let’s load back in our saved model (note: saving and re-loading the model wasn’t necessary here, we only did it to illustrate how to do so):

接下来,让我们加载回已保存的模型(注意:保存并重新加载模型不是必须的,我们这样做只是为了说明如何执行此操作):

1
2
net = Net()
net.load_state_dict(torch.load(PATH))
<All keys matched successfully>

Okay, now let us see what the neural network thinks these examples above are:

1
2
3
outputs = net(images)
print(outputs)
print(outputs.shape)# 输出为4x10,每一行表示该批次的四个样本对应于十个类别的概率
tensor([[-0.1879, -0.3676,  0.2253,  1.2256,  0.0835,  0.2788,  1.4180, -1.7503,
          0.1457, -0.8939],
        [ 4.6166,  7.0969, -1.9678, -3.6791, -2.9520, -4.4108, -4.3927, -4.0881,
          6.0323,  4.0923],
        [ 2.9825,  2.3304, -0.2517, -1.8886, -0.0883, -2.2235, -2.7187, -1.5714,
          2.8798,  0.8799],
        [ 3.1114,  3.2242, -1.1028, -2.2863, -0.0712, -2.7737, -2.7508, -2.1876,
          3.3800,  1.7259]], grad_fn=<AddmmBackward0>)
torch.Size([4, 10])

The outputs are energies for the 10 classes. The higher the energy for a class, the more the network thinks that the image is of the particular class. So, let’s get the index of the highest energy:输出是 10 个类别的概率。类别的概率越高,网络越认为该图像属于特定类别。那么,让我们得到最高概率的索引:

1
2
3
4
5
_, predicted = torch.max(outputs, 1)# 从第1个维度(每一行)挑选出最大的值
print(torch.max(outputs, 1))# 返回一个包含两个张量的元组:第一个张量包含每行的最大值。第二个张量包含每行最大值的索引位置。

print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'
for j in range(4)))
torch.return_types.max(
values=tensor([1.4180, 7.0969, 2.9825, 3.3800], grad_fn=<MaxBackward0>),
indices=tensor([6, 1, 0, 8]))
Predicted:  frog  car   plane ship 

The results seem pretty good.

Let us look at how the network performs on the whole dataset.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():# 不用计算梯度
for data in testloader:
images, labels = data
# calculate outputs by running images through the network
outputs = net(images)
# the class with the highest energy is what we choose as prediction
_, predicted = torch.max(outputs.data, 1)# 使用 outputs.data 已不再推荐,因为它可能会导致梯度计算错误。相应的更好做法是使用 outputs.detach() 来从计算图中分离数据。
total += labels.size(0)
correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')
Accuracy of the network on the 10000 test images: 50 %

That looks way better than chance, which is 10% accuracy (randomly picking a class out of 10 classes). Seems like the network learnt something.

这看起来比随机要好得多,概率为 10%(从 10 个类别中随机选择一个类别)。看来网络学到了一些东西。

Hmmm, what are the classes that performed well, and the classes that did not perform well:

哪些类别表现良好,哪些类别表现不佳?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# prepare to count predictions for each class
# 初始化两个字典,用于记录每个类别的正确预测次数和总预测次数。
# 初始值为0,表示每个类别的正确预测数量从0开始。
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predictions = torch.max(outputs, 1)
# collect the correct predictions for each class
for label, prediction in zip(labels, predictions):
if label == prediction:
correct_pred[classes[label]] += 1
total_pred[classes[label]] += 1


# print accuracy for each class
for classname, correct_count in correct_pred.items():
accuracy = 100 * float(correct_count) / total_pred[classname]
print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')
Accuracy for class: plane is 53.6 %
Accuracy for class: car   is 87.1 %
Accuracy for class: bird  is 35.8 %
Accuracy for class: cat   is 14.2 %
Accuracy for class: deer  is 68.1 %
Accuracy for class: dog   is 23.3 %
Accuracy for class: frog  is 56.8 %
Accuracy for class: horse is 70.2 %
Accuracy for class: ship  is 58.3 %
Accuracy for class: truck is 37.1 %

Okay, so what next?

How do we run these neural networks on the GPU?

Training on GPU

Just like how you transfer a Tensor onto the GPU, you transfer the neural net onto the GPU.

就像将张量传输到 GPU 上一样,将神经网络传输到 GPU 上。

Let’s first define our device as the first visible cuda device if we have CUDA available:

如果我们有可用的 CUDA,我们首先将我们的设备定义为第一个可见的 cuda 设备:

1
2
3
4
5
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)
cuda:0

The rest of this section assumes that device is a CUDA device.

本节的其余部分假设“device”是 CUDA 设备。

Then these methods will recursively go over all modules and convert their parameters and buffers to CUDA tensors:

这些方法将递归地遍历所有模块并将它们的参数和缓冲区转换为 CUDA 张量:

1
net.to(device)

Remember that you will have to send the inputs and targets at every step to the GPU too:

请记住,您还必须将每一步的输入和目标发送到 GPU:

1
inputs, labels = data[0].to(device), data[1].to(device)

Why don’t I notice MASSIVE speedup compared to CPU? Because your network is really small.

与 CPU 相比,为什么我没有注意到大幅加速?因为这个网络实在是太小了。

Exercise: Try increasing the width of your network (argument 2 of the first nn.Conv2d, and argument 1 of the second nn.Conv2d – they need to be the same number), see what kind of speedup you get.

尝试增加网络的复杂度(第一个 nn.Conv2d 的参数 2 和第二个 nn.Conv2d 的参数 1 - 它们需要是相同的数字),看看加速效果。

Goals achieved:

  • Understanding PyTorch’s Tensor library and neural networks at a high level.深入了解 PyTorch 的 Tensor 库和神经网络。
  • Train a small neural network to classify images;训练小型神经网络对图像进行分类。

Training on multiple GPUs在多个 GPU 上进行训练

If you want to see even more MASSIVE speedup using all of your GPUs, please check out data_parallel_tutorial{.interpreted-text role=“doc”}.如果希望使用更多 GPU 获得更大的加速,请查看 data_parallel_tutorial{.interpreted-text role=“doc”}。

Where do I go next?

1
del dataiter