1
00:00:00,870 --> 00:00:06,360
Hi, thank you, and welcome back to the course in this section, we'll take a look at Raisinets.

2
00:00:06,360 --> 00:00:12,270
The resonant network is actually one of the best CNN architectures you can use because they solve a

3
00:00:12,270 --> 00:00:17,730
number of problems which will explain why, and we'll actually go to the mathematics in the next section

4
00:00:18,150 --> 00:00:20,580
to show you how residents actually solve this problem.

5
00:00:20,670 --> 00:00:25,980
So let's talk about firstly, classical CNN's they do look like this.

6
00:00:25,980 --> 00:00:31,770
Remember, we have multiple layers and max pooling, then a fully connected layer at the end, fairly

7
00:00:31,770 --> 00:00:32,130
basic.

8
00:00:32,130 --> 00:00:33,990
You should be fairly familiar with this by now.

9
00:00:34,410 --> 00:00:37,680
It's just linear sequential sequence of linear operations.

10
00:00:38,040 --> 00:00:38,850
So what's the problem?

11
00:00:39,300 --> 00:00:40,590
What problems can happen?

12
00:00:40,680 --> 00:00:47,010
Well, in reality, with Classic, with the classical student architecture, a lot of problems arise

13
00:00:47,010 --> 00:00:47,730
during training.

14
00:00:48,270 --> 00:00:54,570
Often performance gets worse when we have when we increase the number of layers because his is an illustration

15
00:00:54,570 --> 00:00:55,320
of that concept.

16
00:00:55,830 --> 00:00:58,770
This is the expectation you think what's going to happen when you keep training?

17
00:00:59,220 --> 00:01:01,080
However, in reality, this is what happens.

18
00:01:01,080 --> 00:01:02,970
Your performance can get worse over time.

19
00:01:03,360 --> 00:01:09,660
That's when it starts over fitting and the generalization basically it stops being it stops the ability

20
00:01:09,660 --> 00:01:12,690
of the network to generalize, which isn't a good thing, obviously.

21
00:01:13,260 --> 00:01:16,410
So the reason for that is a number of reasons for that, too.

22
00:01:16,770 --> 00:01:22,710
But the main reason for that in the deep CNN's is a problem called exploding and vanishing gradients.

23
00:01:22,710 --> 00:01:26,910
There are two different problems I have for here, but there are actually two different problems, and

24
00:01:26,910 --> 00:01:33,990
I'll show you how they arise in deep networks with number of layers and layers we have and derivatives

25
00:01:33,990 --> 00:01:37,210
that must be multiplied together to perform all gradient updates.

26
00:01:37,210 --> 00:01:46,830
So imagine, instead of like having five multiplications, we now have like 19 or 20 or 50 even more.

27
00:01:47,310 --> 00:01:52,310
And you can see at that point, if a gradient is large, it's just going to explode.

28
00:01:52,320 --> 00:01:55,710
It's going to affect all of the downstream gradients as well.

29
00:01:56,010 --> 00:02:01,890
Likewise, if the gradients of small, it's going to decrease exponentially or vanish, and that's a

30
00:02:01,890 --> 00:02:03,540
problem that resonance actually solve.

31
00:02:04,110 --> 00:02:05,400
So let's take a look at this.

32
00:02:06,210 --> 00:02:09,210
His or the resonant, his whole resonance solve the problem.

33
00:02:09,750 --> 00:02:14,520
This is an illustration of the standard sequential linear operations of a basic CNN.

34
00:02:14,910 --> 00:02:19,570
We have to convert real a hill, followed by another convert regular here.

35
00:02:19,590 --> 00:02:21,480
So what does Reznor do differently?

36
00:02:21,690 --> 00:02:28,290
Well, it connects to input that was going into the first layer to the second conflict.

37
00:02:28,800 --> 00:02:31,230
So now the second conclave has two inputs.

38
00:02:31,230 --> 00:02:37,740
It has two input that was initially meant for the previous layer, as well as the output of that conflict.

39
00:02:38,070 --> 00:02:39,000
So how does that?

40
00:02:39,180 --> 00:02:42,060
How does it how does a short circuit solve the problem?

41
00:02:42,660 --> 00:02:47,100
Well, we'll take a look it up in the next section properly, but I'll explain to you in a high level

42
00:02:47,100 --> 00:02:47,370
note.

43
00:02:47,730 --> 00:02:54,060
What happens here is that if the value, the output value that we had some biases got too small here,

44
00:02:54,720 --> 00:02:55,830
basically it would disappear.

45
00:02:56,130 --> 00:03:03,000
However, because we have the input here going into the second layer here, it doesn't actually vanish

46
00:03:03,000 --> 00:03:03,450
anymore.

47
00:03:03,840 --> 00:03:06,780
It remains a decent value that can be calculated.

48
00:03:07,170 --> 00:03:09,720
That solves a very big problem in deep learning.

49
00:03:09,720 --> 00:03:14,250
With two players, vanishing gradients basically kill our results in the end.

50
00:03:14,910 --> 00:03:18,150
So this is how the resonant model is designed.

51
00:03:18,810 --> 00:03:25,200
They've got tons of these layers, which allows us, which allows the CNN to learn very, very deep

52
00:03:25,200 --> 00:03:28,530
features and allows it to generalize quite well to.

53
00:03:28,740 --> 00:03:33,810
So this is allows us to get to this level, the expectation we want.

54
00:03:34,590 --> 00:03:36,420
So as opposed to this one.

55
00:03:37,620 --> 00:03:42,510
So just to recap about resonance, because I didn't mention this in the beginning, but resonance were

56
00:03:42,510 --> 00:03:50,010
introduced by Microsoft's researchers in 2015 and in a paper titled Deep Residual Learning for Image

57
00:03:50,010 --> 00:03:55,170
Recognition from he and his group of researchers, which I don't name here.

58
00:03:55,350 --> 00:03:56,430
That's what it all means.

59
00:03:56,430 --> 00:03:57,180
Subgroup of them.

60
00:03:57,810 --> 00:03:59,880
And you can see the comparisons of the network.

61
00:04:00,420 --> 00:04:02,080
They've got a 2D folio.

62
00:04:02,100 --> 00:04:03,900
They introduced a 2D folio resonant.

63
00:04:04,410 --> 00:04:09,940
And actually, to be fair, that's one of the basic resonance resonant, resonant 52.

64
00:04:09,960 --> 00:04:14,580
Resonant one to one resonance, 150 to something like that, I believe.

65
00:04:15,000 --> 00:04:20,730
So this you can see drastically how much more layers we're able to pack in a resonant.

66
00:04:21,090 --> 00:04:27,180
And this allows us to end still in many, many more features, many more patterns that helps performance

67
00:04:27,180 --> 00:04:27,840
so much.

68
00:04:28,290 --> 00:04:35,090
So in the next section, we'll take a look at a little bit of the mathematics behind how residents work.

69
00:04:35,490 --> 00:04:40,350
It's not that it's not that complicated, so I'll try to break it down for you in a very simple way.

70
00:04:40,360 --> 00:04:42,510
So I'll see you in the next section.

71
00:04:42,600 --> 00:04:43,050
Thank you.
