1
00:00:00,000 --> 00:00:02,730
So let's now take a look
at the definition of

2
00:00:02,730 --> 00:00:04,410
the neural network
that we'll use to

3
00:00:04,410 --> 00:00:06,525
classify horses versus humans.

4
00:00:06,525 --> 00:00:08,070
It's very similar to what you

5
00:00:08,070 --> 00:00:09,660
just used for the fashion items,

6
00:00:09,660 --> 00:00:12,450
but there are a few minor
differences based on this data,

7
00:00:12,450 --> 00:00:14,520
and the fact that we're
using generators.

8
00:00:14,520 --> 00:00:16,125
So here's the code.

9
00:00:16,125 --> 00:00:19,200
As you can see, it's the
sequential as before with

10
00:00:19,200 --> 00:00:21,090
convolutions and pooling before

11
00:00:21,090 --> 00:00:22,890
we get to the dense
layers at the bottom.

12
00:00:22,890 --> 00:00:25,005
But let's highlight some
of the differences.

13
00:00:25,005 --> 00:00:26,910
First of all, you'll
notice that there are

14
00:00:26,910 --> 00:00:29,790
three sets of convolution
pooling layers at the top.

15
00:00:29,790 --> 00:00:31,799
This reflects
the higher complexity

16
00:00:31,799 --> 00:00:33,465
and size of the images.

17
00:00:33,465 --> 00:00:35,610
Remember our earlier our 28 by

18
00:00:35,610 --> 00:00:38,925
28.5 to 13 and then
five before flattening,

19
00:00:38,925 --> 00:00:41,310
well, now we have 300 by 300.

20
00:00:41,310 --> 00:00:46,385
So we start at 298 by 298 and
then have that etc., etc.

21
00:00:46,385 --> 00:00:49,325
until by the end, we're
at a 35 by 35 image.

22
00:00:49,325 --> 00:00:52,310
We can even add another couple
of layers to this if

23
00:00:52,310 --> 00:00:55,115
we wanted to get to the same
ballpark size as previously,

24
00:00:55,115 --> 00:00:57,320
but we'll keep it
at three for now.

25
00:00:57,320 --> 00:01:00,875
Another thing to pay attention
to is the input shape.

26
00:01:00,875 --> 00:01:02,660
We resize their images to be

27
00:01:02,660 --> 00:01:05,120
300 by 300 as they were loaded,

28
00:01:05,120 --> 00:01:07,160
but they're also color images.

29
00:01:07,160 --> 00:01:09,335
So there are
three bytes per pixel.

30
00:01:09,335 --> 00:01:11,080
One byte for the
red, one for green,

31
00:01:11,080 --> 00:01:12,740
and one for the blue
channel, and that's

32
00:01:12,740 --> 00:01:15,250
a common 24-bit color pattern.

33
00:01:15,250 --> 00:01:17,790
If you're paying
really close attention,

34
00:01:17,790 --> 00:01:20,720
you can see that the output
layer has also changed.

35
00:01:20,720 --> 00:01:23,420
Remember before when you
created the output layer,

36
00:01:23,420 --> 00:01:25,730
you had one neuron per class,

37
00:01:25,730 --> 00:01:28,975
but now there's only one neuron
for two classes.

38
00:01:28,975 --> 00:01:30,575
That's because we're using

39
00:01:30,575 --> 00:01:33,080
a different activation
function where

40
00:01:33,080 --> 00:01:35,840
sigmoid is great for
binary classification,

41
00:01:35,840 --> 00:01:37,310
where one class will tend towards

42
00:01:37,310 --> 00:01:39,680
zero and the other class
tending towards one.

43
00:01:39,680 --> 00:01:42,090
You could use two neurons
here if you want,

44
00:01:42,090 --> 00:01:44,360
and the same softmax
function as before,

45
00:01:44,360 --> 00:01:46,760
but for binary this is
a bit more efficient.

46
00:01:46,760 --> 00:01:48,350
If you want you can
experiment with

47
00:01:48,350 --> 00:01:50,530
the workbook and give
it a try yourself.

48
00:01:50,530 --> 00:01:53,570
Now, if we take a look
at our model summary,

49
00:01:53,570 --> 00:01:54,950
we can see the journey of

50
00:01:54,950 --> 00:01:57,110
the image data through
the convolutions

51
00:01:57,110 --> 00:01:59,660
The 300 by 300 becomes 298 by

52
00:01:59,660 --> 00:02:02,270
298 after the three
by three filter,

53
00:02:02,270 --> 00:02:03,710
it gets pulled to 149 by

54
00:02:03,710 --> 00:02:05,720
149 which in turn gets reduced to

55
00:02:05,720 --> 00:02:07,190
73 by 73 after

56
00:02:07,190 --> 00:02:10,745
the filter that then
gets pulled to 35 by 35,

57
00:02:10,745 --> 00:02:12,170
this will then get flattened,

58
00:02:12,170 --> 00:02:14,450
so 64 convolutions that are 35

59
00:02:14,450 --> 00:02:17,400
squared and shape will
get fed into the DNN.

60
00:02:17,400 --> 00:02:20,550
If you multiply 35 by 35 by 64,

61
00:02:20,550 --> 00:02:23,060
you get 78,400, and that's

62
00:02:23,060 --> 00:02:24,410
the shape of the data once

63
00:02:24,410 --> 00:02:26,075
it comes out of the convolutions.

64
00:02:26,075 --> 00:02:27,785
If we had just fed raw

65
00:02:27,785 --> 00:02:31,460
300 by 300 images without
the convolutions,

66
00:02:31,460 --> 00:02:33,890
that would be over
90,000 values.

67
00:02:33,890 --> 00:02:36,460
So we've already
reduced it quite a bit.