1
00:00:11,580 --> 00:00:17,700
In this lecture, we are going to look at a CoLab notebook that does image classification using PI torch

2
00:00:17,700 --> 00:00:21,480
with a convolutional known that work on the fashion éminence data set.

3
00:00:22,110 --> 00:00:27,060
This lecture is going to walk you through a prepared CoLab notebook, although a very good exercise,

4
00:00:27,060 --> 00:00:32,370
which I always recommend is once you know how this is done, to try and recreate it yourself with as

5
00:00:32,370 --> 00:00:33,880
few references as possible.

6
00:00:34,410 --> 00:00:39,570
As usual, you can look at the title of the notebook to determine what notebook we are currently looking

7
00:00:39,570 --> 00:00:39,830
at.

8
00:00:41,720 --> 00:00:45,260
All right, so let's start by recognizing everything that should seem familiar.

9
00:00:45,890 --> 00:00:47,260
First, we have our imports.

10
00:00:47,450 --> 00:00:49,190
There should be nothing unexpected here.

11
00:00:49,970 --> 00:00:52,410
Notice how we're also importing the daytime module.

12
00:00:53,000 --> 00:00:58,160
This is useful since as your data grows and your models grow, training is going to take longer and

13
00:00:58,160 --> 00:00:58,640
longer.

14
00:00:59,210 --> 00:01:04,520
When training starts to take a non-trivial amount of time, we naturally start to ask, so how long

15
00:01:04,520 --> 00:01:05,330
is this taking?

16
00:01:05,660 --> 00:01:11,210
And so we can use the daytime module to get a sense of how long each iteration of the training loop

17
00:01:11,210 --> 00:01:11,810
takes.

18
00:01:13,790 --> 00:01:18,860
In more high level libraries like Carus, this all happens for you, but in Pittsburgh, if you want

19
00:01:18,860 --> 00:01:20,950
these features, you have to build them yourself.

20
00:01:23,090 --> 00:01:28,730
Next, we load in the data, which is, as we mentioned, the fashion administrator said, since it's

21
00:01:28,730 --> 00:01:33,070
included in the toy division library, nothing has to change in the way we load in this data.

22
00:01:39,640 --> 00:01:43,620
Next, we're going to check a few of the data sets attributes to see how it compares to.

23
00:01:44,590 --> 00:01:46,960
Remember, it's supposed to be a drop in replacement.

24
00:01:47,770 --> 00:01:52,090
As you can see, the maximum value is to 55, which is what we expect.

25
00:01:52,870 --> 00:01:57,820
The shape of the data is sixty thousand by twenty eight by twenty eight, which is what we expect.

26
00:01:58,930 --> 00:02:04,240
And the targets contain integers that look like they're between zero and nine inclusive, which is also

27
00:02:04,240 --> 00:02:05,140
what we expect.

28
00:02:11,380 --> 00:02:16,600
Next, we load in the test data set and then we get the number of classes by casting all the targets

29
00:02:16,600 --> 00:02:19,190
into a set and then taking the length of that, sir.

30
00:02:19,840 --> 00:02:23,200
As expected, the data set contains 10 classes.

31
00:02:29,580 --> 00:02:31,830
The next step is to define our CNN model.

32
00:02:32,430 --> 00:02:37,440
As promised, we're going to start looking at how to build models using inheritance rather than stacking

33
00:02:37,440 --> 00:02:38,970
layers together in a sequential.

34
00:02:39,630 --> 00:02:45,000
On the first line, you can see that the CNN class inherits from the and module class.

35
00:02:45,810 --> 00:02:47,910
This makes our class an official title.

36
00:02:47,910 --> 00:02:53,910
Each module, meaning it has all the useful functions and attributes that a built in PI module has,

37
00:02:53,910 --> 00:02:58,060
even doing automatic differentiation inside the constructor.

38
00:02:58,080 --> 00:03:00,960
The first thing we do is call the parent classes constructor.

39
00:03:01,590 --> 00:03:02,970
Next, we create our layers.

40
00:03:03,600 --> 00:03:09,050
The first set of layers is a series of convolution layers, followed by the YOU activation function.

41
00:03:10,500 --> 00:03:13,810
We can wrap this in a sequential to make it easier to call in the future.

42
00:03:14,790 --> 00:03:18,150
Notice that the first convolution layer has only one input channel.

43
00:03:18,810 --> 00:03:23,580
That's because, as you know, the most intense fashion this data sets are greyscale.

44
00:03:24,420 --> 00:03:29,910
As you recall, we need to do some convolutional arithmetic in order to calculate the arguments into

45
00:03:29,910 --> 00:03:31,010
the first linear layer.

46
00:03:31,590 --> 00:03:33,840
So I've left some useful links here on this topic.

47
00:03:33,990 --> 00:03:40,200
If you want to check them out, I would recommend you do the calculation yourself to make sure that

48
00:03:40,200 --> 00:03:41,860
what I've written below is correct.

49
00:03:49,360 --> 00:03:53,810
Next, we have a series of dense layers, along with some dropouts and a new activation.

50
00:03:54,490 --> 00:03:57,940
Again, we wrap this in a sequential so it's easier to call later.

51
00:04:00,010 --> 00:04:01,490
Next, we have the forward function.

52
00:04:01,930 --> 00:04:06,910
This is where we actually pass a given input through the CNN to get the neural networks output.

53
00:04:07,570 --> 00:04:12,520
As you can see, it's made up of just three lines of code thanks to our sequential objects.

54
00:04:13,210 --> 00:04:16,300
First, we pass our input into the convolution sequentially.

55
00:04:17,200 --> 00:04:21,220
Next, we reshape those outputs in order to flatten them using the view function.

56
00:04:22,060 --> 00:04:25,860
Notice how there's no need to specify both dimensions of the flattened output.

57
00:04:26,500 --> 00:04:30,930
However, it may be easier to specify just the first dimension and not the second one.

58
00:04:31,570 --> 00:04:37,060
The first dimension simply refers to the size which we can obtain by using out of zero.

59
00:04:37,690 --> 00:04:42,190
This is because if we have any samples in, we also must have N samples out.

60
00:04:42,790 --> 00:04:47,830
Clearly, the number of samples we pass into the input of a neural network must also be the number of

61
00:04:47,830 --> 00:04:49,180
predictions that we get out.

62
00:04:51,280 --> 00:04:57,100
The minus one means that PI talks will automatically calculate the leftover dimension, given whatever

63
00:04:57,100 --> 00:04:58,190
dimensions are left.

64
00:05:00,040 --> 00:05:05,500
Next, the third step is to pass the data through the dense layers and finally we return the output.

65
00:05:10,600 --> 00:05:16,180
The next step is to instantiate the model we pass in one argument into the constructor, which is the

66
00:05:16,180 --> 00:05:17,380
number of classes K.

67
00:05:23,210 --> 00:05:29,030
In the next block of code, I show you how to build an equivalent CNN using the sequential module instead

68
00:05:29,030 --> 00:05:35,570
of building a custom class, as you can see, it's much simpler, but still requires you to do convolutional

69
00:05:35,570 --> 00:05:39,190
arithmetic to determine the first linear layer's input size.

70
00:05:39,800 --> 00:05:43,070
You might want to try both models and confirm that they are equivalent.

71
00:05:50,050 --> 00:05:54,100
The next step is to move our model to the GPU, which you already know how to do.

72
00:05:57,820 --> 00:06:01,930
The next step is to create our laws and optimizer, which you already know how to do.

73
00:06:05,600 --> 00:06:09,770
The next step is to create our data loaders, which, again, you already know how to do.

74
00:06:14,280 --> 00:06:18,690
The next step is to create a training function, which, again, you already know how to do.

75
00:06:19,380 --> 00:06:24,870
The only minor difference here is that I have some time in code and when each player completes, I print

76
00:06:24,870 --> 00:06:26,400
the duration of that epoch.

77
00:06:30,260 --> 00:06:33,470
So you can see here I'm printing out the pocket duration.

78
00:06:40,090 --> 00:06:43,940
But now you see how important it is to understand what we were doing previously.

79
00:06:44,320 --> 00:06:46,900
We'll be seeing that exact same code again and again.

80
00:06:52,260 --> 00:06:56,280
All right, so training completed successfully at around six seconds per ibrc.

81
00:07:00,550 --> 00:07:03,250
Next, we plot the lost per iteration, as usual.

82
00:07:04,290 --> 00:07:05,820
So this appears to look at.

83
00:07:12,600 --> 00:07:16,500
Next, we get the train and test accuracy using the same code we saw before.

84
00:07:22,410 --> 00:07:28,440
As you can see, we do pretty well, about 94 percent on the train set and about 89 percent on the test

85
00:07:28,440 --> 00:07:33,810
set, but this is definitely less than the accuracy we got on the original mass data set.

86
00:07:35,800 --> 00:07:37,810
Next, we have our confusion matrix code.

87
00:07:47,340 --> 00:07:50,970
So if we scroll down, we can see what our model is most confused by.

88
00:07:52,610 --> 00:07:58,250
Unfortunately, the axes are labeled with integers, so it's hard to tell what's what, but we can see

89
00:07:58,250 --> 00:08:01,460
if we look at the labels below what each of them means.

90
00:08:07,120 --> 00:08:14,230
So it seems we often confuse zero with six and six with zero, zero corresponds to t shirt and top,

91
00:08:14,590 --> 00:08:16,310
while six corresponds to shirt.

92
00:08:16,840 --> 00:08:20,110
So it definitely makes sense that our model would confuse these.

93
00:08:20,630 --> 00:08:25,180
In fact, one might argue that they are the same thing and that even humans would confuse these.

94
00:08:25,670 --> 00:08:27,610
That's similar to what we saw with minus.

95
00:08:29,170 --> 00:08:32,230
We can also see that sex gets confused with fallot.

96
00:08:33,750 --> 00:08:37,950
As we saw six men shirt, and now we can see that four means coat.

97
00:08:38,490 --> 00:08:44,070
This also makes sense given the very low resolution of the images, basically the little white blobs

98
00:08:44,070 --> 00:08:45,320
on a black background.

99
00:08:46,440 --> 00:08:49,170
We would expect that a court has a similar shape to a shirt.

100
00:08:51,700 --> 00:08:57,700
We also see that two gets confused with forensics as well, pretty often to Stanzler pullover, which

101
00:08:57,700 --> 00:09:02,050
again is pretty much a shirt, especially when it's just a 28 by 28 blob.

102
00:09:02,860 --> 00:09:07,680
On the other end of this, we see that seven often gets confused with nine and vice versa.

103
00:09:09,150 --> 00:09:12,750
Seven corresponds to sneaker and nine corresponds to include.

104
00:09:13,050 --> 00:09:17,520
So, again, considering that the images are just little blobs, this makes total sense.

105
00:09:25,600 --> 00:09:29,920
Let's now look at some misclassified samples to check if our reasoning holds.

106
00:09:31,500 --> 00:09:37,710
So the first one we see is a T-shirt top getting predicted as a shirt, which makes sense.

107
00:09:41,430 --> 00:09:44,370
Here's a shirt predicted as a pullover that makes sense.

108
00:09:46,770 --> 00:09:51,120
Here's a shirt protected as a coat, I wouldn't really consider that to be a coat.

109
00:09:54,600 --> 00:09:58,340
Use a T-shirt predicted as a shirt, that makes sense.

110
00:10:01,290 --> 00:10:03,240
It's a pullover, I think we saw this one.

111
00:10:06,220 --> 00:10:09,690
Here's a sandal predicted as an ankle boot that makes sense.

112
00:10:12,650 --> 00:10:14,480
Here's a quote predicted as a pull over.

113
00:10:14,510 --> 00:10:15,350
That makes sense.

114
00:10:17,170 --> 00:10:20,740
So you can see that all the things that are getting misclassified are generally the same.

115
00:10:25,460 --> 00:10:32,210
OK, so as a final exercise for this lecture, what you should try is mix and match the models and data

116
00:10:32,210 --> 00:10:33,330
that we've seen so far.

117
00:10:34,190 --> 00:10:39,830
Previously, we used an anonymous and now we're using a CNN en fashion Imust.

118
00:10:40,340 --> 00:10:46,310
You should have some intuition that a CNN should perform better than an in it, but also that fashion

119
00:10:46,310 --> 00:10:49,670
amnesty is a harder classification problem than amnesty.

120
00:10:50,180 --> 00:10:54,110
So what will happen if you try to use in an N on fashion amnesty?

121
00:10:54,560 --> 00:11:00,950
What kind of results do you expect to see what will happen if you try to use a CNN on amnesty?

122
00:11:01,250 --> 00:11:03,160
What kind of results do you expect to see?

123
00:11:04,040 --> 00:11:08,930
As always, remember my rule machine learning is experimentation and not philosophy.

124
00:11:09,590 --> 00:11:14,450
Whenever you want to know the outcome of running some computer code, never try to guess the outcome

125
00:11:14,450 --> 00:11:15,320
with your mind.

126
00:11:15,860 --> 00:11:20,480
Well, in this case, it's OK because I've asked you to in order to check whether your intuition is

127
00:11:20,480 --> 00:11:21,020
correct.

128
00:11:21,440 --> 00:11:26,870
But what you really want to do is write these scripts in code and see the answer for yourself.
