1
00:00:00,240 --> 00:00:06,480
Hi and welcome to the chapter where we talk about why convolutional neural networks work so well for

2
00:00:06,480 --> 00:00:08,550
images, so let's get started.

3
00:00:09,660 --> 00:00:12,810
So firstly, let's think of this Twitter experiment.

4
00:00:13,290 --> 00:00:18,870
Imagine, instead of using a convolutional neural network, we were using a regular neural network to

5
00:00:18,870 --> 00:00:20,460
do image classification.

6
00:00:20,970 --> 00:00:26,520
So you don't or you may not be familiar with how the architecture of a regular neural network looks

7
00:00:26,520 --> 00:00:28,140
in comparison to CNN.

8
00:00:28,680 --> 00:00:33,410
However, it's actually quite simple in a way everything is just fully connected.

9
00:00:33,450 --> 00:00:36,240
Generally, that's the basic neural network structure.

10
00:00:36,710 --> 00:00:40,080
It's just layers of fully connected nodes like this here.

11
00:00:40,890 --> 00:00:47,880
So firstly, let's take a look at what how the structure changes for when we have a 28 by 28 size image

12
00:00:48,450 --> 00:00:51,430
that will give us several hundred eighty four input nodes sets.

13
00:00:51,450 --> 00:00:54,630
This multiplied by this just signal firstly alone.

14
00:00:55,230 --> 00:01:01,110
Now that's going to cause a problem, and we will see that now because before we before we go back to

15
00:01:01,110 --> 00:01:09,150
what is how much parameters our CNN look at this year, you have 10 million parameters just in one hidden

16
00:01:09,150 --> 00:01:13,200
layer alone because of all these fully connected kind of connections.

17
00:01:13,980 --> 00:01:20,940
Whereas with the CNN's, we were able to use 22 filters that produced a 26 by 26 feature map each,

18
00:01:21,360 --> 00:01:25,530
and this only gives us two, eighteen thousand three hundred and twelve parameters.

19
00:01:26,190 --> 00:01:30,180
That's a drastic reduction in number of parameters needed for CNN.

20
00:01:30,540 --> 00:01:33,060
So there's a certain sacrifice anything for this.

21
00:01:33,390 --> 00:01:34,360
It actually doesn't.

22
00:01:34,380 --> 00:01:39,510
And that's some of the big advantages advantages of a CNN, which I'll explain why shortly.

23
00:01:40,170 --> 00:01:45,630
So before we move onto that, I just want to talk about that, that neural networks, while they can

24
00:01:45,630 --> 00:01:48,750
work for images, they're just not scalable.

25
00:01:49,200 --> 00:01:51,410
So it's not saying that they can't work, they can work.

26
00:01:51,420 --> 00:01:56,580
And in fact, many people have got decent results with the amnesty to set with standard neural networks.

27
00:01:56,910 --> 00:02:04,050
However, it doesn't scale well at all to large data loss images and large networks with a lot of parameters

28
00:02:04,410 --> 00:02:07,350
have a huge tendency to over fit to their training data.

29
00:02:07,830 --> 00:02:08,460
That's not good.

30
00:02:09,780 --> 00:02:12,690
So these are the advantages of convolutional neural networks.

31
00:02:13,170 --> 00:02:19,380
Firstly, convolutional neural networks can't consult as they share parameters, as we saw previously.

32
00:02:19,800 --> 00:02:25,650
This allows one single little parameter a window that little filter to be used at several parts of the

33
00:02:25,650 --> 00:02:26,020
image.

34
00:02:26,040 --> 00:02:30,060
So that's a form of sharing already reduces the number of parameters we need.

35
00:02:31,260 --> 00:02:34,050
Secondly, because of the sparsity of connections.

36
00:02:34,380 --> 00:02:38,550
And that refers to the fact that pixels are highly correlated in images.

37
00:02:38,880 --> 00:02:46,620
This strong correlation allows us to use an architecture like a CNN to extract information without losing

38
00:02:46,620 --> 00:02:49,510
any real information in a way that work.

39
00:02:49,530 --> 00:02:54,780
So we can actually capture all the relevant information, sharing the parameters with our filters and

40
00:02:54,780 --> 00:02:56,640
not having tons of connections.

41
00:02:56,640 --> 00:02:59,940
So that's where we have the advantage of having sparsity of connections here.

42
00:03:00,660 --> 00:03:02,760
And lastly, CNN's on variance.

43
00:03:02,760 --> 00:03:08,550
Remember when the experiments appear that that means that you can shift different parts of the of the

44
00:03:08,550 --> 00:03:11,490
image of the object or dog.

45
00:03:11,940 --> 00:03:15,840
He can be in different parts of the image here, and CNN will still detect it.

46
00:03:16,260 --> 00:03:19,060
And the reason for that for us because of some assumptions.

47
00:03:19,110 --> 00:03:26,730
CNN's MC they make the assumptions that local features low level features are local, meaning that local

48
00:03:26,730 --> 00:03:34,260
means are all centered or close together so that that that gives us the ability to use filters small

49
00:03:34,260 --> 00:03:37,650
filters to detect features in these low level features.

50
00:03:38,670 --> 00:03:44,930
Secondly, features a translation invariant, meaning that they can be found anywhere in the image and

51
00:03:44,940 --> 00:03:48,180
typically high level features are made up of low level features.

52
00:03:48,540 --> 00:03:55,950
That's how once you could identify an and I and I and moat, you have a strong inclination to know that's

53
00:03:55,950 --> 00:03:58,980
a face, and that's how CNN's learned what these classes are.

54
00:03:59,430 --> 00:04:04,590
They combine different level features, and basically that combination they would realize during the

55
00:04:04,590 --> 00:04:08,120
training process is corresponds to a face.

56
00:04:08,400 --> 00:04:11,850
This one corresponds to maybe a person's body.

57
00:04:11,940 --> 00:04:15,060
This one corresponds to an animal, et cetera.

58
00:04:15,420 --> 00:04:19,980
So those are the assumptions that CNN's make that allow us to do parameter sharing.

59
00:04:20,490 --> 00:04:26,190
Next, we'll get into the very important topic of how we begin training with CNN.

60
00:04:26,700 --> 00:04:27,870
So stay tuned for that.

61
00:04:28,080 --> 00:04:28,530
Thank you.
