1
00:00:11,570 --> 00:00:17,570
In this lecture, we are going to look at a CoLab notebook that does image classification using PI talks

2
00:00:17,570 --> 00:00:20,990
with the convolutional neural network on the 10 data dataset.

3
00:00:21,680 --> 00:00:26,720
This lecture is going to walk you through a prepared CoLab notebook, although a very good exercise,

4
00:00:26,720 --> 00:00:31,940
which I always recommend is once you know how this is done, to try and recreate it yourself with as

5
00:00:31,940 --> 00:00:33,420
few references as possible.

6
00:00:34,010 --> 00:00:38,990
As usual, you can look at the title of the notebook to determine what notebook we are currently looking

7
00:00:38,990 --> 00:00:39,220
at.

8
00:00:41,200 --> 00:00:44,210
All right, so there are a few major differences at the start of this code.

9
00:00:44,950 --> 00:00:50,440
First, you can see that the imports are almost the same as before, except that we also import towards

10
00:00:50,440 --> 00:00:51,190
that functional.

11
00:00:51,700 --> 00:00:53,660
You'll see how that will be used very shortly.

12
00:00:54,370 --> 00:00:58,960
I'd like to introduce these new things when there's not much else going on so that you can learn about

13
00:00:59,140 --> 00:01:02,380
different ways of doing the same thing, which could be useful in the future.

14
00:01:04,890 --> 00:01:10,350
Next, you'll notice that we're importing the C14 data set, which is a color image data set and is

15
00:01:10,350 --> 00:01:11,880
even more difficult than fashion.

16
00:01:11,930 --> 00:01:18,150
Most, however, since the data set is included in talk vision, loaning it in is just as easy as it

17
00:01:18,150 --> 00:01:20,280
was for the previous two data sets.

18
00:01:25,780 --> 00:01:31,750
Next, let's look at the raw data by calling the data attribute, as you can see, it's slightly different

19
00:01:31,750 --> 00:01:35,260
from amnesty and fashion amnesty, which return to torture Tensas.

20
00:01:40,290 --> 00:01:43,560
Instead, C14 is stored as a regular Nampara.

21
00:01:51,310 --> 00:01:52,750
And the tape is using a.

22
00:01:56,350 --> 00:02:01,990
Next, let's check the shape of the training data, we can see that we have 50000 training samples in

23
00:02:01,990 --> 00:02:05,770
each training sample is a 32 by 32 color image.

24
00:02:08,730 --> 00:02:10,210
Next, let's check the targets.

25
00:02:10,710 --> 00:02:12,490
This is, again, different from amnesty.

26
00:02:12,930 --> 00:02:16,170
Instead of being stored as a Tenzer, it's stored as a list.

27
00:02:26,770 --> 00:02:31,780
Next, let's get the number of classes K by adding all the targets to a set and then taking the length

28
00:02:31,780 --> 00:02:32,440
of that set of.

29
00:02:36,990 --> 00:02:40,860
Next, let's create our data load of objects, which is the same as how we did it before.

30
00:02:48,340 --> 00:02:53,200
As mentioned previously, here's a good opportunity to look at one of the quirks of PI talk.

31
00:02:53,740 --> 00:02:58,360
If we look at the shape of the data yielded by our data loader, we see something interesting.

32
00:03:03,210 --> 00:03:08,040
The batch that's yielded by the data loader has shaped one by three by 32 by 32.

33
00:03:08,670 --> 00:03:14,070
This should seem strange to you because now the color channel comes before the spatial dimensions instead

34
00:03:14,070 --> 00:03:19,710
of after Piter, which hides this detail from you by getting you to use data loaders and special data

35
00:03:19,710 --> 00:03:20,640
set functions.

36
00:03:27,010 --> 00:03:29,920
Next, let's look at our CNN custom model class.

37
00:03:30,520 --> 00:03:33,950
You'll notice that this is built a little differently from our previous CNN.

38
00:03:34,660 --> 00:03:38,450
What I'm trying to do is show you different ways of accomplishing the same thing.

39
00:03:39,340 --> 00:03:44,020
One thing that's different is the number of input color channels, which is now three instead of one

40
00:03:44,020 --> 00:03:45,480
since our data is in color.

41
00:03:47,460 --> 00:03:53,160
Next, you'll notice that in the constructor, I've only defined layers that involve parameters, so

42
00:03:53,160 --> 00:03:55,040
there are no real you activations.

43
00:03:55,500 --> 00:03:56,450
So where did they go?

44
00:03:58,930 --> 00:04:03,580
Well, remember that none of these layers are actually called until we get to the forward function.

45
00:04:05,080 --> 00:04:10,690
So let's look at the forward function, we can see here that instead of treating the rellenos as module's

46
00:04:10,690 --> 00:04:15,630
or layer's, we can treat them like functions which can be accessed from torture that functional.

47
00:04:16,420 --> 00:04:18,960
That's the new thing we imported earlier, if you recall.

48
00:04:19,600 --> 00:04:22,090
In fact, dropout can be applied this way as well.

49
00:04:24,230 --> 00:04:29,450
You can also see here an alternative way of using the view function, which, as you recall, flattens

50
00:04:29,450 --> 00:04:29,990
the data.

51
00:04:31,510 --> 00:04:36,700
Instead of putting the minus one in the last dimension, we can put minus one in the first dimension

52
00:04:36,970 --> 00:04:39,600
and then specify the second dimension explicitly.

53
00:04:40,300 --> 00:04:45,580
This isn't any extra work for us because we had to calculate this value anyway inside the constructor.

54
00:04:47,100 --> 00:04:52,290
So you can see 128 times, three times three, both in the constructor and in the forward function.

55
00:04:57,940 --> 00:05:02,650
All right, so next is a bunch of stuff we've seen before, so I'm going to skip over it a lot faster.

56
00:05:03,250 --> 00:05:09,570
So we instantiate our model, we move the model to the GPU, we create a loss in optimizer.

57
00:05:10,030 --> 00:05:12,010
We do our boccherini at the send loop.

58
00:05:15,300 --> 00:05:20,340
And we can see that our model trains quite fast at around six or seven seconds per epoch.

59
00:05:25,630 --> 00:05:28,590
If we look at the per iteration, it seems to look OK.

60
00:05:35,610 --> 00:05:37,470
Next, we check the model accuracy.

61
00:05:41,510 --> 00:05:47,000
So we get about 75 percent on the train set and a 66 percent on the test set.

62
00:05:48,810 --> 00:05:53,430
Of course, this is going to vary depending on several different factors, so when you run this, you

63
00:05:53,430 --> 00:05:55,030
might get a slightly different answer.

64
00:05:56,670 --> 00:06:00,930
But clearly, this is a much more difficult data set than most or fashion missed.

65
00:06:04,910 --> 00:06:06,980
Next, we look at the confusion matrix.

66
00:06:12,920 --> 00:06:18,440
In fact, there are so many things that we predict wrong that it's kind of difficult to look at, we

67
00:06:18,440 --> 00:06:20,840
can see that we confuse one and nine a lot.

68
00:06:21,800 --> 00:06:25,090
One corresponds to automobile and nine corresponds to truck.

69
00:06:25,670 --> 00:06:27,560
This makes sense and fits our intuition.

70
00:06:28,640 --> 00:06:31,160
We can see that three and five get confused a lot.

71
00:06:31,730 --> 00:06:34,640
Three corresponds to CAT and five corresponds to a dog.

72
00:06:35,180 --> 00:06:38,930
That makes sense, especially when you consider that these are very small pictures.

73
00:06:42,730 --> 00:06:49,420
We can also see that zero and eight get confused, the lot zero corresponds to airplane and eight corresponds

74
00:06:49,420 --> 00:06:55,600
to ship, that makes less sense and perhaps it could simply point to the fact that our model is just

75
00:06:55,600 --> 00:06:56,560
not powerful enough.

76
00:07:03,050 --> 00:07:07,970
So if we scroll down to the bottom and we look at some misclassified samples, maybe we can make sense

77
00:07:07,970 --> 00:07:08,950
of these results.

78
00:07:10,970 --> 00:07:15,310
So here's a horse predicted as an airplane that does not make sense at all.

79
00:07:18,030 --> 00:07:21,470
Here's a bird predicted as a deer, that kind of makes sense.

80
00:07:24,550 --> 00:07:27,600
Is a truck protected as an automobile, that kind of makes sense.

81
00:07:30,320 --> 00:07:35,070
He was a cat protected as a bird, looks like a small blob, so that kind of makes sense.

82
00:07:38,460 --> 00:07:41,630
Here's a frog predicted as a deer that doesn't really make sense.

83
00:07:44,380 --> 00:07:47,430
Here's a dog predicted as a cat, that kind of makes sense.

84
00:07:51,030 --> 00:07:55,860
All right, so it seems that a lot of these wrong predictions, the neural network should be able to

85
00:07:55,860 --> 00:07:56,390
get right.

86
00:07:56,850 --> 00:08:00,660
So in the coming lectures, we're going to see how we can improve this model.