1
00:00:00,000 --> 00:00:01,695
In the previous video,

2
00:00:01,695 --> 00:00:03,015
you looked at convolutions

3
00:00:03,015 --> 00:00:04,740
and got a glimpse
for how they worked.

4
00:00:04,740 --> 00:00:06,240
By passing filters over

5
00:00:06,240 --> 00:00:08,745
an image to reduce
the amount of information,

6
00:00:08,745 --> 00:00:10,500
they then allowed
the neural network

7
00:00:10,500 --> 00:00:11,790
to effectively extract

8
00:00:11,790 --> 00:00:13,200
features that can distinguish

9
00:00:13,200 --> 00:00:15,165
one class of image from another.

10
00:00:15,165 --> 00:00:16,890
You also saw how pooling

11
00:00:16,890 --> 00:00:19,920
compresses the information
to make it more manageable.

12
00:00:19,920 --> 00:00:22,110
This is a really nice way to

13
00:00:22,110 --> 00:00:24,675
improve our image
recognition performance.

14
00:00:24,675 --> 00:00:27,750
Let's now look at it in
action using a notebook.

15
00:00:32,550 --> 00:00:35,700
Here's the same neural network
that you used before for

16
00:00:35,700 --> 00:00:37,050
loading the set of images of

17
00:00:37,050 --> 00:00:49,000
clothing and then
classifying them. *long pause*

18
00:00:49,000 --> 00:00:50,815
By the end of epoch five,

19
00:00:50,815 --> 00:00:52,900
you can see the loss
is around 0.29,

20
00:00:52,900 --> 00:00:54,025
meaning, your accuracy is

21
00:00:54,025 --> 00:00:55,780
pretty good on the training data.

22
00:00:55,780 --> 00:00:58,195
It took just
a few seconds to train,

23
00:00:58,195 --> 00:00:59,725
so that's not bad.

24
00:00:59,725 --> 00:01:02,565
With the test data as
before and as expected,

25
00:01:02,565 --> 00:01:04,695
the losses a little
higher and thus,

26
00:01:04,695 --> 00:01:06,725
the accuracy is a little lower.

27
00:01:06,725 --> 00:01:08,775
So now, you can see the code that

28
00:01:08,775 --> 00:01:10,695
adds convolutions and pooling.

29
00:01:10,695 --> 00:01:12,900
We're going to do
two convolutional layers

30
00:01:12,900 --> 00:01:14,849
each with 64 convolution,

31
00:01:14,849 --> 00:01:17,595
and each followed by
a max pooling layer.

32
00:01:17,595 --> 00:01:20,295
You can see that we defined
our convolutions to be

33
00:01:20,295 --> 00:01:21,945
three-by-three and our pools to

34
00:01:21,945 --> 00:01:24,465
be two-by-two. Let's train.

35
00:01:24,465 --> 00:01:26,535
The first thing
you'll notice is that

36
00:01:26,535 --> 00:01:28,350
the training is much slower.

37
00:01:28,350 --> 00:01:31,950
For every image, 64 convolutions
are being tried,

38
00:01:31,950 --> 00:01:34,035
and then the image
is compressed and

39
00:01:34,035 --> 00:01:36,060
then another 64 convolutions,

40
00:01:36,060 --> 00:01:37,680
and then it's compressed again,

41
00:01:37,680 --> 00:01:39,630
and then it's passed
through the DNN,

42
00:01:39,630 --> 00:01:41,565
and that's for 60,000 images

43
00:01:41,565 --> 00:01:43,580
that this is happening
on each epoch.

44
00:01:43,580 --> 00:01:47,085
So it might take a few minutes
instead of a few seconds.

45
00:01:47,085 --> 00:01:48,945
Now that it's done,

46
00:01:48,945 --> 00:01:51,495
you can see that the loss
has improved a little.

47
00:01:51,495 --> 00:01:54,465
In this case, it's brought
our accuracy up a bit for

48
00:01:54,465 --> 00:01:56,235
both our test data and with

49
00:01:56,235 --> 00:01:58,895
our training data. That's
pretty cool, right?

50
00:01:58,895 --> 00:02:01,125
Now, let's take
a look at the code

51
00:02:01,125 --> 00:02:02,565
at the bottom of the notebook.

52
00:02:02,565 --> 00:02:05,040
Now, this is a really
fun visualization

53
00:02:05,040 --> 00:02:07,485
of the journey of an image
through the convolutions.

54
00:02:07,485 --> 00:02:11,070
First, I'll print out
the first 100 test labels.

55
00:02:11,070 --> 00:02:14,565
The number nine as we saw
earlier is a shoe or boots.

56
00:02:14,565 --> 00:02:17,475
I picked out a few instances
of this whether the zero,

57
00:02:17,475 --> 00:02:20,280
the 23rd and the 28th
labels are all nine.

58
00:02:20,280 --> 00:02:22,215
So let's take a look
at their journey.

59
00:02:22,215 --> 00:02:24,120
The Keras API gives us

60
00:02:24,120 --> 00:02:27,055
each convolution and each
pooling and each dense,

61
00:02:27,055 --> 00:02:28,710
etc. as a layer.

62
00:02:28,710 --> 00:02:30,495
So with the layers API,

63
00:02:30,495 --> 00:02:32,924
I can take a look at
each layer's outputs,

64
00:02:32,924 --> 00:02:36,210
so I'll create a list
of each layer's output.

65
00:02:36,210 --> 00:02:38,955
I can then treat
each item in the layer as

66
00:02:38,955 --> 00:02:41,205
an individual activation model

67
00:02:41,205 --> 00:02:44,140
if I want to see the results
of just that layer.

68
00:02:44,140 --> 00:02:46,585
Now, by looping
through the layers,

69
00:02:46,585 --> 00:02:48,330
I can display
the journey of the image

70
00:02:48,330 --> 00:02:50,145
through the first convolution and

71
00:02:50,145 --> 00:02:51,525
then the first pooling and

72
00:02:51,525 --> 00:02:54,365
then the second convolution
and then the second pooling.

73
00:02:54,365 --> 00:02:56,175
Note how the size of the image is

74
00:02:56,175 --> 00:02:58,650
changing by looking at the axes.

75
00:02:58,650 --> 00:03:02,145
If I set the convolution
number to one,

76
00:03:02,145 --> 00:03:04,455
we can see that it almost
immediately detects

77
00:03:04,455 --> 00:03:07,970
the laces area as a common
feature between the shoes.

78
00:03:07,970 --> 00:03:11,235
So, for example, if I change
the third image to be one,

79
00:03:11,235 --> 00:03:12,659
which looks like a handbag,

80
00:03:12,659 --> 00:03:15,015
you'll see that it also
has a bright line near

81
00:03:15,015 --> 00:03:18,195
the bottom that could look
like the sole of the shoes,

82
00:03:18,195 --> 00:03:20,415
but by the time it gets
through the convolutions,

83
00:03:20,415 --> 00:03:22,425
that's lost, and that area

84
00:03:22,425 --> 00:03:24,525
for the laces doesn't
even show up at all.

85
00:03:24,525 --> 00:03:26,325
So this convolution definitely

86
00:03:26,325 --> 00:03:28,790
helps me separate
issue from a handbag.

87
00:03:28,790 --> 00:03:30,955
Again, if I said it's a two,

88
00:03:30,955 --> 00:03:32,485
it appears to be trousers,

89
00:03:32,485 --> 00:03:34,515
but the feature that
detected something that

90
00:03:34,515 --> 00:03:37,215
the shoes had in
common fails again.

91
00:03:37,215 --> 00:03:41,580
Also, if I changed my third
image back to that for shoe,

92
00:03:41,580 --> 00:03:43,820
but I tried a different
convolution number,

93
00:03:43,820 --> 00:03:45,905
you'll see that for
convolution two,

94
00:03:45,905 --> 00:03:48,215
it didn't really find
any common features.

95
00:03:48,215 --> 00:03:51,020
To see commonality in
a different image,

96
00:03:51,020 --> 00:03:52,985
try images two, three, and five.

97
00:03:52,985 --> 00:03:54,875
These all appear to be trousers.

98
00:03:54,875 --> 00:03:57,425
Convolutions two and
four seem to detect

99
00:03:57,425 --> 00:03:59,045
this vertical feature as

100
00:03:59,045 --> 00:04:01,100
something they all
have in common.

101
00:04:01,100 --> 00:04:03,305
If I again go to the list and

102
00:04:03,305 --> 00:04:05,195
find three labels
that are the same,

103
00:04:05,195 --> 00:04:08,285
in this case six, I can
see what they signify.

104
00:04:08,285 --> 00:04:12,395
When I run it, I can see that
they appear to be shirts.

105
00:04:12,395 --> 00:04:14,165
Convolution four doesn't do

106
00:04:14,165 --> 00:04:16,010
a whole lot, so let's try five.

107
00:04:16,010 --> 00:04:17,915
We can kind of see that the color

108
00:04:17,915 --> 00:04:19,715
appears to light up in this case.

109
00:04:19,715 --> 00:04:21,010
Let's try convolution one.

110
00:04:21,010 --> 00:04:23,280
I don't know about you, but I
can play with this all day.

111
00:04:23,280 --> 00:04:26,415
Then see what you do when
you run it for yourself.

112
00:04:26,415 --> 00:04:27,675
When you're done playing,

113
00:04:27,675 --> 00:04:30,090
try tweaking the code
with these suggestions,

114
00:04:30,090 --> 00:04:33,450
editing the convolutions,
removing the final convolution,

115
00:04:33,450 --> 00:04:35,555
and adding more, etc.

116
00:04:35,555 --> 00:04:37,955
Also, in a previous exercise,

117
00:04:37,955 --> 00:04:39,645
you added a callback that

118
00:04:39,645 --> 00:04:42,075
finished training once
the loss had a certain amount.

119
00:04:42,075 --> 00:04:43,835
So try to add that here.

120
00:04:43,835 --> 00:04:46,295
When you're done, we'll
move to the next stage,

121
00:04:46,295 --> 00:04:47,775
and that's dealing
with images that are

122
00:04:47,775 --> 00:04:50,075
larger and more complex
than these ones.

123
00:04:50,075 --> 00:04:52,455
To see how convolutions
can maybe detect

124
00:04:52,455 --> 00:04:55,845
features when they aren't
always in the same place,

125
00:04:55,845 --> 00:04:57,315
like they would be
in these tightly

126
00:04:57,315 --> 00:05:00,165
controlled 28 by 28 images.